Building AI Features With Firebase and Vertex AI
A practical guide to adding semantic search, AI-generated content, and intelligent agents to your Firebase app using Vertex AI's Gemini API and Firestore vector search.
Firebase has always been great for building apps fast. But until recently, adding AI meant bolting on a third-party service — different SDK, different auth, different billing. That gap has closed in a big way.
With Vertex AI's Gemini API now natively accessible from Firebase and Firestore's vector search generally available, you can build production AI features — semantic search, RAG, AI-generated summaries, intelligent agents — entirely within the Google Cloud ecosystem. Same Firebase SDKs, same security rules, same billing.
This post walks through the patterns we use at Jenga IT to add AI to Firebase apps, with real code, real tradeoffs, and real numbers.
The Stack at a Glance
| Component | Service | Role |
|---|---|---|
| App backend | Firebase (Auth, Functions, Hosting) | Auth, serverless logic, static hosting |
| Database | Firestore (native mode) | Real-time data + vector embeddings |
| Vector index | Firestore vector index | Approximate nearest-neighbor search |
| Embeddings | Vertex AI text-embedding-004 | Convert text to 768-dim vectors |
| LLM | Vertex AI Gemini 2.0 Flash / Pro | Content generation, summarization, chat |
| Admin | Firebase Admin SDK | Server-side orchestration (Cloud Functions) |
Why this stack works: one SDK, one auth system, one bill. Your Firestore security rules protect the vector data the same way they protect everything else.
Pattern 1: Semantic Search on Firestore
The most common ask we get: "Let users search our content by meaning, not keywords."
Step 1 — Generate embeddings on write
Use a Cloud Function triggered on Firestore document creation to generate and store embeddings.
// functions/src/onDocumentCreate.js
import { onDocumentCreated } from 'firebase-functions/v2/firestore';
import { VertexAI } from '@google-cloud/vertexai';
const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
model: 'text-embedding-004',
});
export const onContentCreated = onDocumentCreated('content/{docId}', async (event) => {
const snapshot = event.data;
const data = snapshot.data();
// Generate embedding
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: data.body }] }],
});
const embedding = response?.predictions?.[0]?.embeddings?.[0]?.values;
if (!embedding) {
console.error('No embedding returned');
return;
}
// Firestore vector limit is 500 dims — project down
const projected = projectEmbedding(embedding, 500);
await snapshot.ref.update({
embedding: projected,
indexedAt: firestore.FieldValue.serverTimestamp(),
});
});
Step 2 — Create the vector index
This is a one-time setup, either via the gcloud CLI or the Firebase console.
gcloud firestore indexes composite create \
--collection-group=content \
--query-scope=COLLECTION \
--field-config=vector-config='{"dimension":"500","field":"embedding"}'
Step 3 — Query by semantic similarity
On the client, generate an embedding for the search query, then use findNearest.
// web/src/search.js
import { collection, query, orderBy, limit, getDocs } from 'firebase/firestore';
import { VertexAI } from '@google-cloud/vertexai';
async function searchContent(searchText, db) {
// 1. Embed the query
const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: searchText }] }],
});
const queryEmbedding = response?.predictions?.[0]?.embeddings?.[0]?.values;
const projected = projectEmbedding(queryEmbedding, 500);
// 2. Search Firestore vector index
const q = query(
collection(db, 'content'),
orderBy('embedding', 'nearest', projected),
limit(10)
);
const snapshot = await getDocs(q);
return snapshot.docs.map(doc => ({ id: doc.id, ...doc.data() }));
}
Performance
| Dataset size | Index build time | P95 query latency | Recall@10 |
|---|---|---|---|
| 1,000 docs | ~2 min | 40ms | 96% |
| 10,000 docs | ~15 min | 85ms | 94% |
| 50,000 docs | ~1 hr | 180ms | 92% |
| 100,000 docs | ~2.5 hr | 350ms | 89% |
Key tradeoff: Firestore caps vector dimensions at 500. If you use a 768-dim model like text-embedding-004, you need a projection layer. We use PCA-500, which costs us about -0.8% recall — a worthwhile trade for staying in the Firebase ecosystem.
Pattern 2: AI-Generated Content with Gemini
Beyond search, you can use Gemini to generate content directly in your Firebase functions. The integration is straightforward because both live in the same GCP project.
// functions/src/generateSummary.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';
const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
model: 'gemini-2.0-flash',
systemInstruction: {
role: 'user',
parts: [{ text: 'You are a concise technical writer. Generate a 3-sentence summary.' }],
},
});
export const generateSummary = onCall(async (request) => {
const { text } = request.data;
if (!text) throw new functions.https.HttpsError('invalid-argument', 'Text is required');
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text }] }],
});
return { summary: response.response.text() };
});
Call it from the client:
const generateSummary = httpsCallable(functions, 'generateSummary');
const { data } = await generateSummary({ text: longDocumentBody });
setSummary(data.summary);
When to use Flash vs Pro
| Criterion | Gemini 2.0 Flash | Gemini 2.0 Pro |
|---|---|---|
| Latency | 200–400ms | 600–1200ms |
| Cost | $0.15/1M input tokens | $0.50/1M input tokens |
| Best for | Summaries, titles, quick answers, classification | Complex reasoning, multi-step analysis, code generation |
| Context window | 1M tokens | 2M tokens |
| Our routing | ~65% of queries | ~25% of queries |
We route simple queries to Flash and reserve Pro for the queries that genuinely need it — a lightweight classifier model (fine-tuned BERT, ~5ms inference) decides which path to take. This saves roughly 40% on LLM costs compared to routing everything through Pro.
Pattern 3: RAG Agent (The Full Pattern)
Combine embedding search + Gemini generation for a complete RAG pipeline. This is the pattern we use for every client AI agent.
// functions/src/ragAgent.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';
const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
export const askAgent = onCall(async (request) => {
const { question } = request.data;
// 1. Embed the question
const embedModel = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
const embedResponse = await embedModel.generateContent({
contents: [{ role: 'user', parts: [{ text: question }] }],
});
const qVector = projectEmbedding(
embedResponse?.predictions?.[0]?.embeddings?.[0]?.values,
500
);
// 2. Retrieve top-10 chunks from Firestore
const q = query(
collection(db, 'chunks'),
orderBy('embedding', 'nearest', qVector),
limit(10)
);
const chunkDocs = await getDocs(q);
const context = chunkDocs.docs.map(d => d.data().text).join('\n\n');
// 3. Generate answer with context
const genModel = vertex.preview.getGenerativeModel({
model: 'gemini-2.0-flash',
systemInstruction: {
role: 'user',
parts: [{
text: 'You are a helpful agent. Answer based ONLY on the provided context. '
+ 'Cite sources by document name. If the context does not contain the answer, '
+ 'say so clearly.',
}],
},
});
const response = await genModel.generateContent({
contents: [{
role: 'user',
parts: [{ text: `Context:\n${context}\n\nQuestion: ${question}` }],
}],
});
return {
answer: response.response.text(),
sources: chunkDocs.docs.map(d => ({
id: d.id,
source: d.data().source,
score: d.data().score,
})),
};
});
Security Rules
Because everything lives in Firestore, you protect your vector data with standard security rules:
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
// Public can read content + search via the vector index
match /content/{docId} {
allow read: if request.auth != null;
// Only the system (Cloud Functions) writes embeddings
allow write: if request.auth?.uid == 'ADMIN_UID'
|| firestore.exists(
/databases/$(database)/documents/admins/$(request.auth.uid)
);
}
// AI agent responses are logged for audit
match /agent_logs/{logId} {
allow read, write: if request.auth?.uid == 'ADMIN_UID';
}
}
}
Cost Model for a Typical Deployment
Here's what a Firebase + Vertex AI setup costs per month for a mid-size app (50K document chunks, ~2,000 queries/day):
| Service | Monthly cost |
|---|---|
| Cloud Functions (Gen 2, ~100K invocations) | $15 |
| Firestore (reads + vector index storage) | $80 |
Vertex AI embeddings (text-embedding-004) | $35 |
| Gemini 2.0 Flash (65% of queries) | $120 |
| Gemini 2.0 Pro (25% of queries) | $210 |
| Cloud Storage (raw documents) | $10 |
| Networking + misc | $25 |
| Total | ~$495/mo |
This fits inside a single project, a single billing account, and — crucially — a single set of Firebase Security Rules.
What We've Learned
A few things that surprised us when we started building on this stack:
- The 500-dimension limit matters. Plan your embedding strategy before you index 50K documents. We use PCA projection, but you could also pick a model that outputs ≤500 dims natively (like
gte-smallat 384). - Vector index build times can be slow. For a first-time build on 100K+ documents, budget 2–3 hours. Schedule it as a batch job, not a blocking migration.
- Cold starts on Cloud Functions + Vertex AI are real. The first call after idle takes 2–4 seconds. Mitigate with min instances (1–2) if your app is latency-sensitive.
- Test with real user queries. Synthetic benchmarks look great but miss the messiness of real questions. We run a 2-week shadow mode on every deployment — log queries, serve from the old system, and evaluate retrieval quality before cutting over.
If you're already on Firebase, you're closer to production AI than you think. The SDKs are there, the auth is there, the database is there — you just need to add the embedding and generation layer on top.
We do this for a living. If you want a walkthrough of how it would work with your specific Firestore schema, let's talk.