Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Groq LLM

GroqLLM plugs Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1 into Patter’s pipeline mode. Groq’s LPU inference engine serves Llama models at very high throughput with low time-to-first-token, making it a strong pick when latency matters more than long-context reasoning. The provider is a thin wrapper around the OpenAI Chat Completions client with a Groq-specific base URL — every OpenAI sampling option (responseFormat, parallelToolCalls, toolChoice, seed, topP, frequencyPenalty, presencePenalty, stop, temperature, maxTokens) is forwarded to chat.completions.create automatically.

Install

npm install getpatter
pip install "getpatter[groq]"

Usage

// Namespaced import
import * as groq from "getpatter/llm/groq";

const llm = new groq.LLM();                                 // reads GROQ_API_KEY
const llm = new groq.LLM({ apiKey: "gsk_...", model: "llama-3.3-70b-versatile" });
const llm = new groq.LLM({
  model: "llama-3.3-70b-versatile",
  responseFormat: { type: "json_object" },                  // OpenAI-style structured outputs
  seed: 42,
});

// Flat alias (equivalent)
import { GroqLLM } from "getpatter";

const llm2 = new GroqLLM();
The namespaced import (import * as groq from "getpatter/llm/groq" / from getpatter.llm import groq) auto-resolves the API key from GROQ_API_KEY and exposes a uniform LLM class — the same pattern Patter uses for STT and TTS namespaces.
Plug it into an agent:
import { Patter, Twilio, DeepgramSTT, GroqLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new GroqLLM(),                                       // GROQ_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi, how can I help?",
});

await phone.serve(agent);

Supported models

Pricing in USD per 1M tokens. Availability depends on account tier — Groq’s free tier rate-limits more aggressively than the paid plans.
ModelInputOutputNotes
llama-3.3-70b-versatile (default)$0.59$0.79General-purpose Llama 3.3, long context.
llama-3.1-8b-instant$0.05$0.08Cheapest fast option.
llama-3.3-70b-specdecn/an/aSpeculative decoding variant.
llama3-70b-8192n/an/aLlama 3, 8K context.
llama3-8b-8192n/an/aLlama 3, 8K context.
mixtral-8x7b-32768n/an/aMixtral MoE, 32K context.
gemma2-9b-itn/an/aGoogle Gemma 2 instruct.
Models without listed rates are available on the API but aren’t yet pinned to a LLM_PRICING entry — pass pricing overrides if your dashboard needs cost figures for them.

Environment variables

VariableRequiredNotes
GROQ_API_KEYyesAuto-loaded when apiKey / api_key is omitted.

Options

OptionDefaultNotes
apiKey / api_keyundefinedReads from GROQ_API_KEY when omitted.
model"llama-3.3-70b-versatile"Any Groq chat model id.
baseUrl / base_urlhttps://api.groq.com/openai/v1Override the Groq endpoint (rarely needed).
temperature, maxTokens, topP, seed, frequencyPenalty, presencePenalty, stop, responseFormat, parallelToolCalls, toolChoiceunsetAll forwarded to chat.completions.create. See the Groq API docs for accepted values.

Notes

  • Groq returns the standard OpenAI Chat Completions stream shape, so tool calls, JSON mode, and seeded sampling all work without provider-specific code.
  • Time-to-first-token on Groq’s LPU is typically < 200 ms for the 70B model and < 100 ms for the 8B model — well below most TTS startup latency.
  • Long-context calls (32K+) use Mixtral; everything else fits comfortably in the Llama 3.3 context.