Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
Groq LLM
GroqLLM plugs Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1 into Patter’s pipeline mode. Groq’s LPU inference engine serves Llama models at very high throughput with low time-to-first-token, making it a strong pick when latency matters more than long-context reasoning.
The provider is a thin wrapper around the OpenAI Chat Completions client with a Groq-specific base URL — every OpenAI sampling option (responseFormat, parallelToolCalls, toolChoice, seed, topP, frequencyPenalty, presencePenalty, stop, temperature, maxTokens) is forwarded to chat.completions.create automatically.
Install
Usage
The namespaced import (
import * as groq from "getpatter/llm/groq" / from getpatter.llm import groq) auto-resolves the API key from GROQ_API_KEY and exposes a uniform LLM class — the same pattern Patter uses for STT and TTS namespaces.Supported models
Pricing in USD per 1M tokens. Availability depends on account tier — Groq’s free tier rate-limits more aggressively than the paid plans.| Model | Input | Output | Notes |
|---|---|---|---|
llama-3.3-70b-versatile (default) | $0.59 | $0.79 | General-purpose Llama 3.3, long context. |
llama-3.1-8b-instant | $0.05 | $0.08 | Cheapest fast option. |
llama-3.3-70b-specdec | n/a | n/a | Speculative decoding variant. |
llama3-70b-8192 | n/a | n/a | Llama 3, 8K context. |
llama3-8b-8192 | n/a | n/a | Llama 3, 8K context. |
mixtral-8x7b-32768 | n/a | n/a | Mixtral MoE, 32K context. |
gemma2-9b-it | n/a | n/a | Google Gemma 2 instruct. |
LLM_PRICING entry — pass pricing overrides if your dashboard needs cost figures for them.
Environment variables
| Variable | Required | Notes |
|---|---|---|
GROQ_API_KEY | yes | Auto-loaded when apiKey / api_key is omitted. |
Options
| Option | Default | Notes |
|---|---|---|
apiKey / api_key | undefined | Reads from GROQ_API_KEY when omitted. |
model | "llama-3.3-70b-versatile" | Any Groq chat model id. |
baseUrl / base_url | https://api.groq.com/openai/v1 | Override the Groq endpoint (rarely needed). |
temperature, maxTokens, topP, seed, frequencyPenalty, presencePenalty, stop, responseFormat, parallelToolCalls, toolChoice | unset | All forwarded to chat.completions.create. See the Groq API docs for accepted values. |
Notes
- Groq returns the standard OpenAI Chat Completions stream shape, so tool calls, JSON mode, and seeded sampling all work without provider-specific code.
- Time-to-first-token on Groq’s LPU is typically < 200 ms for the 70B model and < 100 ms for the 8B model — well below most TTS startup latency.
- Long-context calls (32K+) use Mixtral; everything else fits comfortably in the Llama 3.3 context.

