GCP & Firebase

Building AI Features With Firebase and Vertex AI

A practical guide to adding semantic search, AI-generated content, and intelligent agents to your Firebase app using Vertex AI's Gemini API and Firestore vector search.

JIT

Jenga IT ConsultingFounder & Lead Developer

2026-05-25⏱ 10 min read

Firebase has always been great for building apps fast. But until recently, adding AI meant bolting on a third-party service — different SDK, different auth, different billing. That gap has closed in a big way.

With Vertex AI's Gemini API now natively accessible from Firebase and Firestore's vector search generally available, you can build production AI features — semantic search, RAG, AI-generated summaries, intelligent agents — entirely within the Google Cloud ecosystem. Same Firebase SDKs, same security rules, same billing.

This post walks through the patterns we use at Jenga IT to add AI to Firebase apps, with real code, real tradeoffs, and real numbers.

The Stack at a Glance

Component	Service	Role
App backend	Firebase (Auth, Functions, Hosting)	Auth, serverless logic, static hosting
Database	Firestore (native mode)	Real-time data + vector embeddings
Vector index	Firestore vector index	Approximate nearest-neighbor search
Embeddings	Vertex AI `text-embedding-004`	Convert text to 768-dim vectors
LLM	Vertex AI Gemini 2.0 Flash / Pro	Content generation, summarization, chat
Admin	Firebase Admin SDK	Server-side orchestration (Cloud Functions)

Why this stack works: one SDK, one auth system, one bill. Your Firestore security rules protect the vector data the same way they protect everything else.

Pattern 1: Semantic Search on Firestore

The most common ask we get: "Let users search our content by meaning, not keywords."

Step 1 — Generate embeddings on write

Use a Cloud Function triggered on Firestore document creation to generate and store embeddings.

// functions/src/onDocumentCreate.js
import { onDocumentCreated } from 'firebase-functions/v2/firestore';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
  model: 'text-embedding-004',
});

export const onContentCreated = onDocumentCreated('content/{docId}', async (event) => {
  const snapshot = event.data;
  const data = snapshot.data();

  // Generate embedding
  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: data.body }] }],
  });

  const embedding = response?.predictions?.[0]?.embeddings?.[0]?.values;

  if (!embedding) {
    console.error('No embedding returned');
    return;
  }

  // Firestore vector limit is 500 dims — project down
  const projected = projectEmbedding(embedding, 500);

  await snapshot.ref.update({
    embedding: projected,
    indexedAt: firestore.FieldValue.serverTimestamp(),
  });
});

Step 2 — Create the vector index

This is a one-time setup, either via the gcloud CLI or the Firebase console.

gcloud firestore indexes composite create \
  --collection-group=content \
  --query-scope=COLLECTION \
  --field-config=vector-config='{"dimension":"500","field":"embedding"}'

Step 3 — Query by semantic similarity

On the client, generate an embedding for the search query, then use findNearest.

// web/src/search.js
import { collection, query, orderBy, limit, getDocs } from 'firebase/firestore';
import { VertexAI } from '@google-cloud/vertexai';

async function searchContent(searchText, db) {
  // 1. Embed the query
  const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
  const model = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: searchText }] }],
  });
  const queryEmbedding = response?.predictions?.[0]?.embeddings?.[0]?.values;
  const projected = projectEmbedding(queryEmbedding, 500);

  // 2. Search Firestore vector index
  const q = query(
    collection(db, 'content'),
    orderBy('embedding', 'nearest', projected),
    limit(10)
  );

  const snapshot = await getDocs(q);
  return snapshot.docs.map(doc => ({ id: doc.id, ...doc.data() }));
}

Performance

Dataset size	Index build time	P95 query latency	Recall@10
1,000 docs	~2 min	40ms	96%
10,000 docs	~15 min	85ms	94%
50,000 docs	~1 hr	180ms	92%
100,000 docs	~2.5 hr	350ms	89%

Key tradeoff: Firestore caps vector dimensions at 500. If you use a 768-dim model like text-embedding-004, you need a projection layer. We use PCA-500, which costs us about -0.8% recall — a worthwhile trade for staying in the Firebase ecosystem.

Pattern 2: AI-Generated Content with Gemini

Beyond search, you can use Gemini to generate content directly in your Firebase functions. The integration is straightforward because both live in the same GCP project.

// functions/src/generateSummary.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });
const model = vertex.preview.getGenerativeModel({
  model: 'gemini-2.0-flash',
  systemInstruction: {
    role: 'user',
    parts: [{ text: 'You are a concise technical writer. Generate a 3-sentence summary.' }],
  },
});

export const generateSummary = onCall(async (request) => {
  const { text } = request.data;
  if (!text) throw new functions.https.HttpsError('invalid-argument', 'Text is required');

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text }] }],
  });

  return { summary: response.response.text() };
});

Call it from the client:

const generateSummary = httpsCallable(functions, 'generateSummary');
const { data } = await generateSummary({ text: longDocumentBody });
setSummary(data.summary);

When to use Flash vs Pro

Criterion	Gemini 2.0 Flash	Gemini 2.0 Pro
Latency	200–400ms	600–1200ms
Cost	$0.15/1M input tokens	$0.50/1M input tokens
Best for	Summaries, titles, quick answers, classification	Complex reasoning, multi-step analysis, code generation
Context window	1M tokens	2M tokens
Our routing	~65% of queries	~25% of queries

We route simple queries to Flash and reserve Pro for the queries that genuinely need it — a lightweight classifier model (fine-tuned BERT, ~5ms inference) decides which path to take. This saves roughly 40% on LLM costs compared to routing everything through Pro.

Pattern 3: RAG Agent (The Full Pattern)

Combine embedding search + Gemini generation for a complete RAG pipeline. This is the pattern we use for every client AI agent.

// functions/src/ragAgent.js
import { onCall } from 'firebase-functions/v2/https';
import { VertexAI } from '@google-cloud/vertexai';

const vertex = new VertexAI({ project: process.env.GCLOUD_PROJECT });

export const askAgent = onCall(async (request) => {
  const { question } = request.data;

  // 1. Embed the question
  const embedModel = vertex.preview.getGenerativeModel({ model: 'text-embedding-004' });
  const embedResponse = await embedModel.generateContent({
    contents: [{ role: 'user', parts: [{ text: question }] }],
  });
  const qVector = projectEmbedding(
    embedResponse?.predictions?.[0]?.embeddings?.[0]?.values,
    500
  );

  // 2. Retrieve top-10 chunks from Firestore
  const q = query(
    collection(db, 'chunks'),
    orderBy('embedding', 'nearest', qVector),
    limit(10)
  );
  const chunkDocs = await getDocs(q);
  const context = chunkDocs.docs.map(d => d.data().text).join('\n\n');

  // 3. Generate answer with context
  const genModel = vertex.preview.getGenerativeModel({
    model: 'gemini-2.0-flash',
    systemInstruction: {
      role: 'user',
      parts: [{
        text: 'You are a helpful agent. Answer based ONLY on the provided context. '
            + 'Cite sources by document name. If the context does not contain the answer, '
            + 'say so clearly.',
      }],
    },
  });

  const response = await genModel.generateContent({
    contents: [{
      role: 'user',
      parts: [{ text: `Context:\n${context}\n\nQuestion: ${question}` }],
    }],
  });

  return {
    answer: response.response.text(),
    sources: chunkDocs.docs.map(d => ({
      id: d.id,
      source: d.data().source,
      score: d.data().score,
    })),
  };
});

Security Rules

Because everything lives in Firestore, you protect your vector data with standard security rules:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {

    // Public can read content + search via the vector index
    match /content/{docId} {
      allow read: if request.auth != null;
      // Only the system (Cloud Functions) writes embeddings
      allow write: if request.auth?.uid == 'ADMIN_UID'
                    || firestore.exists(
                      /databases/$(database)/documents/admins/$(request.auth.uid)
                    );
    }

    // AI agent responses are logged for audit
    match /agent_logs/{logId} {
      allow read, write: if request.auth?.uid == 'ADMIN_UID';
    }
  }
}

Cost Model for a Typical Deployment

Here's what a Firebase + Vertex AI setup costs per month for a mid-size app (50K document chunks, ~2,000 queries/day):

Service	Monthly cost
Cloud Functions (Gen 2, ~100K invocations)	$15
Firestore (reads + vector index storage)	$80
Vertex AI embeddings (`text-embedding-004`)	$35
Gemini 2.0 Flash (65% of queries)	$120
Gemini 2.0 Pro (25% of queries)	$210
Cloud Storage (raw documents)	$10
Networking + misc	$25
Total	~$495/mo

This fits inside a single project, a single billing account, and — crucially — a single set of Firebase Security Rules.

What We've Learned

A few things that surprised us when we started building on this stack:

The 500-dimension limit matters. Plan your embedding strategy before you index 50K documents. We use PCA projection, but you could also pick a model that outputs ≤500 dims natively (like gte-small at 384).
Vector index build times can be slow. For a first-time build on 100K+ documents, budget 2–3 hours. Schedule it as a batch job, not a blocking migration.
Cold starts on Cloud Functions + Vertex AI are real. The first call after idle takes 2–4 seconds. Mitigate with min instances (1–2) if your app is latency-sensitive.
Test with real user queries. Synthetic benchmarks look great but miss the messiness of real questions. We run a 2-week shadow mode on every deployment — log queries, serve from the old system, and evaluate retrieval quality before cutting over.

If you're already on Firebase, you're closer to production AI than you think. The SDKs are there, the auth is there, the database is there — you just need to add the embedding and generation layer on top.

We do this for a living. If you want a walkthrough of how it would work with your specific Firestore schema, let's talk.