GCP & Firebase

Building AI Features With Firebase and Vertex AI

A practical guide to adding semantic search, AI-generated content, and intelligent agents to your Firebase app using Vertex AI's Gemini API and Firestore vector search.

Firebase has always been great for building apps fast. But until recently, adding AI meant bolting on a third-party service — different SDK, different auth, different billing. That gap has closed in a big way.

With Vertex AI's Gemini API now natively accessible from Firebase and Firestore's vector search generally available, you can build production AI features — semantic search, RAG, AI-generated summaries, intelligent agents — entirely within the Google Cloud ecosystem. Same Firebase SDKs, same security rules, same billing.

This post walks through the patterns we use at Jenga IT to add AI to Firebase apps, with real code, real tradeoffs, and real numbers.

The Stack at a Glance

ComponentServiceRole
App backendFirebase (Auth, Functions, Hosting)Auth, serverless logic, static hosting
DatabaseFirestore (native mode)Real-time data + vector embeddings
Vector indexFirestore vector indexApproximate nearest-neighbor search
EmbeddingsVertex AI text-embedding-004Convert text to 768-dim vectors
LLMVertex AI Gemini 2.0 Flash / ProContent generation, summarization, chat
AdminFirebase Admin SDKServer-side orchestration (Cloud Functions)

Why this stack works: one SDK, one auth system, one bill. Your Firestore security rules protect the vector data the same way they protect everything else.

Pattern 1: Semantic Search on Firestore

The most common ask we get: "Let users search our content by meaning, not keywords."

Step 1 — Generate embeddings on write

Use a Cloud Function triggered on Firestore document creation to generate and store embeddings.

// functions/src/onDocumentCreate.js
import { onDocumentCreated } from 'firebase-functions/v2/firestore';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
  model: 'text-embedding-004',
});

export const onContentCreated = onDocumentCreated('content/{docId}', async (event) => {
  const snapshot = event.data;
  const data = snapshot.data();

  // Generate embedding
  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: data.body }] }],
  });

  const embedding = response?.predictions?.[0]?.embeddings?.[0]?.values;

  if (!embedding) {
    console.error('No embedding returned');
    return;
  }

  // Firestore vector limit is 500 dims — project down
  const projected = projectEmbedding(embedding, 500);

  await snapshot.ref.update({
    embedding: projected,
    indexedAt: firestore.FieldValue.serverTimestamp(),
  });
});

Step 2 — Create the vector index

This is a one-time setup, either via the gcloud CLI or the Firebase console.

gcloud firestore indexes composite create \
  --collection-group=content \
  --query-scope=COLLECTION \
  --field-config=vector-config='{"dimension":"500","field":"embedding"}'

Step 3 — Query by semantic similarity

On the client, generate an embedding for the search query, then use findNearest.

// web/src/search.js
import { collection, query, orderBy, limit, getDocs } from 'firebase/firestore';
import { VertexAI } from '@google-cloud/vertexai';

async function searchContent(searchText, db) {
  // 1. Embed the query
  const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
  const model = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: searchText }] }],
  });
  const queryEmbedding = response?.predictions?.[0]?.embeddings?.[0]?.values;
  const projected = projectEmbedding(queryEmbedding, 500);

  // 2. Search Firestore vector index
  const q = query(
    collection(db, 'content'),
    orderBy('embedding', 'nearest', projected),
    limit(10)
  );

  const snapshot = await getDocs(q);
  return snapshot.docs.map(doc => ({ id: doc.id, ...doc.data() }));
}

Performance

Dataset sizeIndex build timeP95 query latencyRecall@10
1,000 docs~2 min40ms96%
10,000 docs~15 min85ms94%
50,000 docs~1 hr180ms92%
100,000 docs~2.5 hr350ms89%

Key tradeoff: Firestore caps vector dimensions at 500. If you use a 768-dim model like text-embedding-004, you need a projection layer. We use PCA-500, which costs us about -0.8% recall — a worthwhile trade for staying in the Firebase ecosystem.

Pattern 2: AI-Generated Content with Gemini

Beyond search, you can use Gemini to generate content directly in your Firebase functions. The integration is straightforward because both live in the same GCP project.

// functions/src/generateSummary.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
  model: 'gemini-2.0-flash',
  systemInstruction: {
    role: 'user',
    parts: [{ text: 'You are a concise technical writer. Generate a 3-sentence summary.' }],
  },
});

export const generateSummary = onCall(async (request) => {
  const { text } = request.data;
  if (!text) throw new functions.https.HttpsError('invalid-argument', 'Text is required');

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text }] }],
  });

  return { summary: response.response.text() };
});

Call it from the client:

const generateSummary = httpsCallable(functions, 'generateSummary');
const { data } = await generateSummary({ text: longDocumentBody });
setSummary(data.summary);

When to use Flash vs Pro

CriterionGemini 2.0 FlashGemini 2.0 Pro
Latency200–400ms600–1200ms
Cost$0.15/1M input tokens$0.50/1M input tokens
Best forSummaries, titles, quick answers, classificationComplex reasoning, multi-step analysis, code generation
Context window1M tokens2M tokens
Our routing~65% of queries~25% of queries

We route simple queries to Flash and reserve Pro for the queries that genuinely need it — a lightweight classifier model (fine-tuned BERT, ~5ms inference) decides which path to take. This saves roughly 40% on LLM costs compared to routing everything through Pro.

Pattern 3: RAG Agent (The Full Pattern)

Combine embedding search + Gemini generation for a complete RAG pipeline. This is the pattern we use for every client AI agent.

// functions/src/ragAgent.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });

export const askAgent = onCall(async (request) => {
  const { question } = request.data;

  // 1. Embed the question
  const embedModel = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
  const embedResponse = await embedModel.generateContent({
    contents: [{ role: 'user', parts: [{ text: question }] }],
  });
  const qVector = projectEmbedding(
    embedResponse?.predictions?.[0]?.embeddings?.[0]?.values,
    500
  );

  // 2. Retrieve top-10 chunks from Firestore
  const q = query(
    collection(db, 'chunks'),
    orderBy('embedding', 'nearest', qVector),
    limit(10)
  );
  const chunkDocs = await getDocs(q);
  const context = chunkDocs.docs.map(d => d.data().text).join('\n\n');

  // 3. Generate answer with context
  const genModel = vertex.preview.getGenerativeModel({
    model: 'gemini-2.0-flash',
    systemInstruction: {
      role: 'user',
      parts: [{
        text: 'You are a helpful agent. Answer based ONLY on the provided context. '
            + 'Cite sources by document name. If the context does not contain the answer, '
            + 'say so clearly.',
      }],
    },
  });

  const response = await genModel.generateContent({
    contents: [{
      role: 'user',
      parts: [{ text: `Context:\n${context}\n\nQuestion: ${question}` }],
    }],
  });

  return {
    answer: response.response.text(),
    sources: chunkDocs.docs.map(d => ({
      id: d.id,
      source: d.data().source,
      score: d.data().score,
    })),
  };
});

Security Rules

Because everything lives in Firestore, you protect your vector data with standard security rules:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {

    // Public can read content + search via the vector index
    match /content/{docId} {
      allow read: if request.auth != null;
      // Only the system (Cloud Functions) writes embeddings
      allow write: if request.auth?.uid == 'ADMIN_UID'
                    || firestore.exists(
                      /databases/$(database)/documents/admins/$(request.auth.uid)
                    );
    }

    // AI agent responses are logged for audit
    match /agent_logs/{logId} {
      allow read, write: if request.auth?.uid == 'ADMIN_UID';
    }
  }
}

Cost Model for a Typical Deployment

Here's what a Firebase + Vertex AI setup costs per month for a mid-size app (50K document chunks, ~2,000 queries/day):

ServiceMonthly cost
Cloud Functions (Gen 2, ~100K invocations)$15
Firestore (reads + vector index storage)$80
Vertex AI embeddings (text-embedding-004)$35
Gemini 2.0 Flash (65% of queries)$120
Gemini 2.0 Pro (25% of queries)$210
Cloud Storage (raw documents)$10
Networking + misc$25
Total~$495/mo

This fits inside a single project, a single billing account, and — crucially — a single set of Firebase Security Rules.

What We've Learned

A few things that surprised us when we started building on this stack:

  1. The 500-dimension limit matters. Plan your embedding strategy before you index 50K documents. We use PCA projection, but you could also pick a model that outputs ≤500 dims natively (like gte-small at 384).
  2. Vector index build times can be slow. For a first-time build on 100K+ documents, budget 2–3 hours. Schedule it as a batch job, not a blocking migration.
  3. Cold starts on Cloud Functions + Vertex AI are real. The first call after idle takes 2–4 seconds. Mitigate with min instances (1–2) if your app is latency-sensitive.
  4. Test with real user queries. Synthetic benchmarks look great but miss the messiness of real questions. We run a 2-week shadow mode on every deployment — log queries, serve from the old system, and evaluate retrieval quality before cutting over.

If you're already on Firebase, you're closer to production AI than you think. The SDKs are there, the auth is there, the database is there — you just need to add the embedding and generation layer on top.

We do this for a living. If you want a walkthrough of how it would work with your specific Firestore schema, let's talk.