Groq LLM

GroqLLM plugs Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1 into Patter’s pipeline mode. Groq’s LPU inference engine serves Llama models at very high throughput with low time-to-first-token, making it a strong pick when latency matters more than long-context reasoning. The provider is a thin wrapper around the OpenAI Chat Completions client with a Groq-specific base URL — every OpenAI sampling option (responseFormat, parallelToolCalls, toolChoice, seed, topP, frequencyPenalty, presencePenalty, stop, temperature, maxTokens) is forwarded to chat.completions.create automatically.

Install

npm install getpatter

pip install "getpatter[groq]"

Usage

// Namespaced import
import * as groq from "getpatter/llm/groq";

const llm = new groq.LLM();                                 // reads GROQ_API_KEY
const llm = new groq.LLM({ apiKey: "gsk_...", model: "llama-3.3-70b-versatile" });
const llm = new groq.LLM({
  model: "llama-3.3-70b-versatile",
  responseFormat: { type: "json_object" },                  // OpenAI-style structured outputs
  seed: 42,
});

// Flat alias (equivalent)
import { GroqLLM } from "getpatter";

const llm2 = new GroqLLM();

The namespaced import (import * as groq from "getpatter/llm/groq" / from getpatter.llm import groq) auto-resolves the API key from GROQ_API_KEY and exposes a uniform LLM class — the same pattern Patter uses for STT and TTS namespaces.

Plug it into an agent:

import { Patter, Twilio, DeepgramSTT, GroqLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new GroqLLM(),                                       // GROQ_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi, how can I help?",
});

await phone.serve(agent);

Supported models

Pricing in USD per 1M tokens. Availability depends on account tier — Groq’s free tier rate-limits more aggressively than the paid plans.

Model	Input	Output	Notes
`llama-3.3-70b-versatile` (default)	$0.59	$0.79	General-purpose Llama 3.3, long context.
`llama-3.1-8b-instant`	$0.05	$0.08	Cheapest fast option.
`llama-3.3-70b-specdec`	n/a	n/a	Speculative decoding variant.
`llama3-70b-8192`	n/a	n/a	Llama 3, 8K context.
`llama3-8b-8192`	n/a	n/a	Llama 3, 8K context.
`mixtral-8x7b-32768`	n/a	n/a	Mixtral MoE, 32K context.
`gemma2-9b-it`	n/a	n/a	Google Gemma 2 instruct.

Models without listed rates are available on the API but aren’t yet pinned to a LLM_PRICING entry — pass pricing overrides if your dashboard needs cost figures for them.

Environment variables

Variable	Required	Notes
`GROQ_API_KEY`	yes	Auto-loaded when `apiKey` / `api_key` is omitted.

Options

Option	Default	Notes
`apiKey` / `api_key`	`undefined`	Reads from `GROQ_API_KEY` when omitted.
`model`	`"llama-3.3-70b-versatile"`	Any Groq chat model id.
`baseUrl` / `base_url`	`https://api.groq.com/openai/v1`	Override the Groq endpoint (rarely needed).
`temperature`, `maxTokens`, `topP`, `seed`, `frequencyPenalty`, `presencePenalty`, `stop`, `responseFormat`, `parallelToolCalls`, `toolChoice`	unset	All forwarded to `chat.completions.create`. See the Groq API docs for accepted values.

Notes

Groq returns the standard OpenAI Chat Completions stream shape, so tool calls, JSON mode, and seeded sampling all work without provider-specific code.
Time-to-first-token on Groq’s LPU is typically < 200 ms for the 70B model and < 100 ms for the 8B model — well below most TTS startup latency.
Long-context calls (32K+) use Mixtral; everything else fits comfortably in the Llama 3.3 context.

Get Started

Setting up Patter

Observability

Integrations

Development

Groq

Groq LLM

Install

Usage

Supported models

Environment variables

Options

Notes

Get Started

Setting up Patter

Observability

Integrations

Development

Documentation Index

​Groq LLM

​Install

​Usage

​Supported models

​Environment variables

​Options

​Notes

Groq LLM

Install

Usage

Supported models

Environment variables

Options

Notes