Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Google Gemini LLM

GoogleLLM plugs Google Gemini chat models into Patter’s pipeline mode via the Generative Language API. Streams normalise to Patter’s unified {type: "text" | "tool_call" | "done"} chunk protocol, and Gemini function_call parts map directly onto Patter tools.
This page covers Google Gemini in chat-completions mode for the pipeline (STT → LLM → TTS). For Gemini’s bidirectional speech-to-speech engine, see the separate gemini-live adapter under Engines.

Install

npm install getpatter
pip install "getpatter[google]"

Usage

// Namespaced import
import * as google from "getpatter/llm/google";

const llm = new google.LLM();                               // reads GEMINI_API_KEY or GOOGLE_API_KEY
const llm = new google.LLM({ apiKey: "AIza...", model: "gemini-2.5-flash" });

// Flat alias (equivalent)
import { GoogleLLM } from "getpatter";

const llm2 = new GoogleLLM();
The namespaced import (import * as google from "getpatter/llm/google" / from getpatter.llm import google) auto-resolves the API key from GEMINI_API_KEY first, then GOOGLE_API_KEY for parity with other SDKs, and exposes a uniform LLM class.
Plug it into an agent:
import { Patter, Twilio, DeepgramSTT, GoogleLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new GoogleLLM(),                                     // GEMINI_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi, how can I help?",
});

await phone.serve(agent);

Supported models

Pricing in USD per 1M tokens.
ModelInputOutputNotes
gemini-2.5-flash (default)$0.30$2.50Best price/perf for voice.
gemini-2.5-pro$1.25$10.00Highest quality.
gemini-2.0-flashn/an/aOlder fast model.
gemini-2.0-flash-liten/an/aLightweight 2.0.
gemini-1.5-flashn/an/aLegacy fast model.
gemini-1.5-pron/an/aLegacy pro model.
For the speech-to-speech variant gemini-live-2.5-flash-native-audio (input 0.30/output0.30 / output 2.50), see the Engines page — it is a separate Realtime adapter, not a chat-completions model.

Environment variables

VariableRequiredNotes
GEMINI_API_KEYone of thesePreferred — Google’s CLI tooling uses this name.
GOOGLE_API_KEYone of theseLegacy/alt name accepted for parity.

Options

OptionDefaultNotes
apiKey / api_keyundefinedReads GEMINI_API_KEY, then GOOGLE_API_KEY.
model"gemini-2.5-flash"Any Gemini chat model id.
baseUrl / base_urlunsetOverride the Generative Language API endpoint (rarely needed).
temperatureunsetOptional sampling temperature.
maxOutputTokens / max_output_tokensunsetOutput token cap.
Vertex AI is currently only exposed in the Python SDK (vertexai=True, project=..., location=...). The TypeScript adapter targets the Developer API; Vertex AI parity is on the roadmap.

Function calling

Gemini’s function_call parts map directly onto Patter tools — define a tool once and it works on every LLM provider. Patter assigns a monotonically increasing index per function_call part since Gemini does not provide a stable per-call index across stream chunks. Token usage is collected from usage_metadata (cumulative on each chunk; only the last value is yielded as a usage event to avoid double-counting).