Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Google Gemini LLM

GoogleLLM plugs Google Gemini chat models into Patter’s pipeline mode via the google-genai SDK. It supports both the Gemini Developer API (with an API key) and Vertex AI (with GCP project + location). Streams normalise to Patter’s unified {type: "text" | "tool_call" | "done"} chunk protocol, and Gemini function_call parts map directly onto Patter tools.
This page covers Google Gemini in chat-completions mode for the pipeline (STT → LLM → TTS). For Gemini’s bidirectional speech-to-speech engine, see the separate gemini-live adapter under Engines.

Install

pip install "getpatter[google]"
npm install getpatter

Usage

# Namespaced import
from getpatter.llm import google

llm = google.LLM()                                          # reads GEMINI_API_KEY (or GOOGLE_API_KEY)
llm = google.LLM(api_key="AIza...", model="gemini-2.5-flash")

# Vertex AI
llm = google.LLM(
    vertexai=True,
    project="my-gcp-project",
    location="us-central1",
)

# Flat alias (equivalent)
from getpatter import GoogleLLM

llm = GoogleLLM()
The namespaced import (from getpatter.llm import google / import * as google from "getpatter/llm/google") auto-resolves the API key from GEMINI_API_KEY first, then GOOGLE_API_KEY for parity with other SDKs, and exposes a uniform LLM class.
Plug it into an agent:
import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, GoogleLLM, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=GoogleLLM(),                                        # GEMINI_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
    first_message="Hi, how can I help?",
)

asyncio.run(phone.serve(agent))

Supported models

Pricing in USD per 1M tokens.
ModelInputOutputNotes
gemini-2.5-flash (default)$0.30$2.50Best price/perf for voice.
gemini-2.5-pro$1.25$10.00Highest quality.
gemini-2.0-flashn/an/aOlder fast model.
gemini-2.0-flash-liten/an/aLightweight 2.0.
gemini-1.5-flashn/an/aLegacy fast model.
gemini-1.5-pron/an/aLegacy pro model.
For the speech-to-speech variant gemini-live-2.5-flash-native-audio (input 0.30/output0.30 / output 2.50), see the Engines page — it is a separate Realtime adapter, not a chat-completions model.

Environment variables

VariableRequiredNotes
GEMINI_API_KEYone of thesePreferred — Google’s CLI tooling uses this name.
GOOGLE_API_KEYone of theseLegacy/alt name accepted for parity.
GOOGLE_GENAI_USE_VERTEXAIoptionalSet to 1 / true to default vertexai=True.
GOOGLE_CLOUD_PROJECTVertex AIGCP project ID when vertexai=True.
GOOGLE_CLOUD_LOCATIONVertex AIGCP region (defaults to us-central1).

Options

OptionDefaultNotes
api_key / apiKeyNoneReads GEMINI_API_KEY, then GOOGLE_API_KEY. Ignored when vertexai=True.
model"gemini-2.5-flash"Any Gemini chat model id.
vertexaiFalseUse Vertex AI instead of the Developer API.
projectNoneGCP project (Vertex AI).
location"us-central1"GCP region (Vertex AI).
temperatureunsetOptional sampling temperature.
max_output_tokens / maxOutputTokensunsetOutput token cap.

Vertex AI

Switch to Vertex AI when you need GCP-native auth (service accounts), VPC Service Controls, regional residency, or per-project billing isolation.
from getpatter.llm import google

llm = google.LLM(
    vertexai=True,
    project="my-gcp-project",
    location="europe-west4",                               # GoogleVertexLocation enum
    model="gemini-2.5-pro",
)
The google-genai SDK picks up Application Default Credentials automatically — set GOOGLE_APPLICATION_CREDENTIALS to a service-account key path or run gcloud auth application-default login for local dev.

Function calling

Gemini’s function_call parts map directly onto Patter tools — define a tool once and it works on every LLM provider. Patter assigns a monotonically increasing index per function_call part since Gemini does not provide a stable per-call index across stream chunks. Token usage is collected from usage_metadata (cumulative on each chunk; only the last value is yielded as a usage event to avoid double-counting).