|
1 | 1 | --- |
2 | 2 | title: "Groq" |
3 | | -description: "Learn how to configure and use Groq's lightning-fast inference to access models from OpenAI, Meta, DeepSeek, and more with Groq." |
| 3 | +description: "Configure Groq's ultra-fast LPU inference for models from OpenAI, Meta, and DeepSeek." |
4 | 4 | --- |
5 | 5 |
|
6 | | -Groq provides ultra-fast AI inference through their custom LPU™ (Language Processing Unit) architecture, purpose-built for inference rather than adapted from training hardware. Groq hosts open-source models from various providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others. |
| 6 | +Groq provides ultra-fast AI inference through custom LPU™ (Language Processing Unit) architecture. Hosts open-source models from OpenAI, Meta, DeepSeek, and others. |
7 | 7 |
|
8 | 8 | **Website:** [https://groq.com/](https://groq.com/) |
9 | 9 |
|
10 | | -### Getting an API Key |
| 10 | +## Getting an API Key |
11 | 11 |
|
12 | | -1. **Sign Up/Sign In:** Go to [Groq](https://groq.com/) and create an account or sign in. |
13 | | -2. **Navigate to Console:** Go to the [Groq Console](https://console.groq.com/) to access your dashboard. |
14 | | -3. **Create a Key:** Navigate to the API Keys section and create a new API key. Give your key a descriptive name (e.g., "CodinIT"). |
15 | | -4. **Copy the Key:** Copy the API key immediately. You will not be able to see it again. Store it securely. |
| 12 | +1. Go to [Groq Console](https://console.groq.com/) and sign in |
| 13 | +2. Navigate to API Keys section |
| 14 | +3. Create a new API key and name it (e.g., "CodinIT") |
| 15 | +4. Copy the key immediately - you won't see it again |
16 | 16 |
|
17 | | -### Supported Models |
| 17 | +## Configuration |
18 | 18 |
|
19 | | -CodinIT supports the following Groq models: |
| 19 | +1. Click the settings icon (⚙️) in CodinIT |
| 20 | +2. Select "Groq" as the API Provider |
| 21 | +3. Paste your API key |
| 22 | +4. Choose your model |
20 | 23 |
|
21 | | -- `llama-3.3-70b-versatile` (Meta) - Balanced performance with 131K context |
22 | | -- `llama-3.1-8b-instant` (Meta) - Fast inference with 131K context |
23 | | -- `openai/gpt-oss-120b` (OpenAI) - Featured flagship model with 131K context |
24 | | -- `openai/gpt-oss-20b` (OpenAI) - Featured compact model with 131K context |
25 | | -- `moonshotai/kimi-k2-instruct` (Moonshot AI) - 1 trillion parameter model with prompt caching |
26 | | -- `deepseek-r1-distill-llama-70b` (DeepSeek/Meta) - Reasoning-optimized model |
27 | | -- `qwen/qwen3-32b` (Alibaba Cloud) - Enhanced for Q&A tasks |
28 | | -- `meta-llama/llama-4-maverick-17b-128e-instruct` (Meta) - Latest Llama 4 variant |
29 | | -- `meta-llama/llama-4-scout-17b-16e-instruct` (Meta) - Latest Llama 4 variant |
| 24 | +## Supported Models |
30 | 25 |
|
31 | | -### Configuration in CodinIT |
| 26 | +- `llama-3.3-70b-versatile` (Meta) - 131K context |
| 27 | +- `openai/gpt-oss-120b` (OpenAI) - 131K context |
| 28 | +- `moonshotai/kimi-k2-instruct` - 1T parameters with caching |
| 29 | +- `deepseek-r1-distill-llama-70b` - Reasoning optimized |
| 30 | +- `qwen/qwen3-32b` (Alibaba) - Q&A enhanced |
| 31 | +- `meta-llama/llama-4-maverick-17b-128e-instruct` |
32 | 32 |
|
33 | | -1. **Open CodinIT Settings:** Click the settings icon (⚙️) in the CodinIT panel. |
34 | | -2. **Select Provider:** Choose "Groq" from the "API Provider" dropdown. |
35 | | -3. **Enter API Key:** Paste your Groq API key into the "Groq API Key" field. |
36 | | -4. **Select Model:** Choose your desired model from the "Model" dropdown. |
| 33 | +## Key Features |
37 | 34 |
|
38 | | -### Groq's Speed Revolution |
| 35 | +- **Ultra-fast inference:** Sub-millisecond latency with LPU architecture |
| 36 | +- **Large context:** Up to 131K tokens |
| 37 | +- **Prompt caching:** Available on select models |
| 38 | +- **Vision support:** Available on select models |
39 | 39 |
|
40 | | -Groq's LPU architecture delivers several key advantages over traditional GPU-based inference: |
| 40 | +Learn more about [LPU architecture](https://groq.com/blog/inside-the-lpu-deconstructing-groq-speed). |
41 | 41 |
|
42 | | -#### LPU Architecture |
43 | | -Unlike GPUs that are adapted from training workloads, Groq's LPU is purpose-built for inference. This eliminates architectural bottlenecks that create latency in traditional systems. |
| 42 | +## Notes |
44 | 43 |
|
45 | | -#### Unmatched Speed |
46 | | -- **Sub-millisecond latency** that stays consistent across traffic, regions, and workloads |
47 | | -- **Static scheduling** with pre-computed execution graphs eliminates runtime coordination delays |
48 | | -- **Tensor parallelism** optimized for low-latency single responses rather than high-throughput batching |
49 | | - |
50 | | -#### Quality Without Tradeoffs |
51 | | -- **TruePoint numerics** reduce precision only in areas that don't affect accuracy |
52 | | -- **100-bit intermediate accumulation** ensures lossless computation |
53 | | -- **Strategic precision control** maintains quality while achieving 2-4× speedup over BF16 |
54 | | - |
55 | | -#### Memory Architecture |
56 | | -- **SRAM as primary storage** (not cache) with hundreds of megabytes on-chip |
57 | | -- **Eliminates DRAM/HBM latency** that plagues traditional accelerators |
58 | | -- **Enables true tensor parallelism** by splitting layers across multiple chips |
59 | | - |
60 | | -Learn more about Groq's technology in their [LPU architecture blog post](https://groq.com/blog/inside-the-lpu-deconstructing-groq-speed). |
61 | | - |
62 | | -### Special Features |
63 | | - |
64 | | -#### Prompt Caching |
65 | | -The Kimi K2 model supports prompt caching, which can significantly reduce costs and latency for repeated prompts. |
66 | | - |
67 | | -#### Vision Support |
68 | | -Select models support image inputs and vision capabilities. Check the model details in the Groq Console for specific capabilities. |
69 | | - |
70 | | -#### Reasoning Models |
71 | | -Some models like DeepSeek variants offer enhanced reasoning capabilities with step-by-step thought processes. |
72 | | - |
73 | | -### Tips and Notes |
74 | | - |
75 | | -- **Model Selection:** Choose models based on your specific use case and performance requirements. |
76 | | -- **Speed Advantage:** Groq excels at single-request latency rather than high-throughput batch processing. |
77 | | -- **OSS Model Provider:** Groq hosts open-source models from multiple providers (OpenAI, Meta, DeepSeek, etc.) on their fast infrastructure. |
78 | | -- **Context Windows:** Most models offer large context windows (up to 131K tokens) for including substantial code and context. |
79 | | -- **Pricing:** Groq offers competitive pricing with their speed advantages. Check the [Groq Pricing](https://groq.com/pricing) page for current rates. |
80 | | -- **Rate Limits:** Groq has generous rate limits, but check their documentation for current limits based on your usage tier. |
| 44 | +- **Speed:** Optimized for single-request latency |
| 45 | +- **Pricing:** See [Groq Pricing](https://groq.com/pricing) |
0 commit comments