Skip to main content

Portkey + Groq

Portkey is the Control Panel for AI apps. With it’s popular AI Gateway and Observability Suite, hundreds of teams ship reliable, cost-efficient, and fast apps. With Portkey:
  • Connect to 1,600+ models through a unified API,
  • View 40+ metrics & logs for all requests,
  • Enable semantic cache to reduce latency & costs,
  • Implement automatic retries & fallbacks for failed requests,
  • Add custom tags to requests for better tracking and analysis and more.

Use Groq API with OpenAI Compatibility

Portkey is fully compatible with the OpenAI signature. Connect to the Portkey AI Gateway through the OpenAI Client:
  • Set base_url to PORTKEY_GATEWAY_URL
  • Add default_headers using the createHeaders helper method
Prerequisites:
pip install -qU portkey-ai openai

With OpenAI Client

OpenAI Python
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
from google.colab import userdata

client = OpenAI(
    api_key=userdata.get('GROQ_API_KEY'),  # replace with your Groq API key
    base_url=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        provider="groq",
        api_key=userdata.get('PORTKEY_API_KEY')  # replace with your Portkey API key
    )
)

chat_complete = client.chat.completions.create(
    model="llama3-70b-8192",
    messages=[{"role": "user", "content": "What's the purpose of Generative AI?"}]
)

print(chat_complete.choices[0].message.content)
The primary purpose of generative AI is to create new, original, and often 
realistic data or content, such as images, videos, music, text, or speeches...

With Portkey Client

Note: Add your Groq API key in Model Catalog and access models using your provider slug
Python
from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

completion = portkey.chat.completions.create(
    model="@groq-prod/llama3-70b-8192",  # @provider-slug/model
    messages=[{"role": "user", "content": "Who are you?"}],
    max_tokens=250
)
print(completion)
Output
{
    "id": "chatcmpl-8cec08e0-910e-4331-9c4b-f675d9923371",
    "choices": [{
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
            "content": "I am LLaMA, an AI assistant developed by Meta AI...",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
        }
    }],
    "created": 1714136032,
    "model": "llama3-70b-8192",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {"prompt_tokens": 14, "completion_tokens": 147, "total_tokens": 161}
}

Advanced Routing - Load Balancing

Load balancing distributes traffic across multiple API keys or providers based on custom weights for high availability and optimal performance. Example: Split traffic between Groq’s llama-3-70b (70%) and OpenAI’s gpt-3.5 (30%):
Python
config = {
    "strategy": {"mode": "loadbalance"},
  "targets": [
        {"override_params": {"model": "@groq-prod/llama3-70b-8192"}, "weight": 0.7},
        {"override_params": {"model": "@openai-prod/gpt-4o"}, "weight": 0.3}
    ]
}
OpenAI Python
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
from google.colab import userdata

client = OpenAI(
    api_key="X",
    base_url=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key=userdata.get("PORTKEY_API_KEY"),
        config=config
    )
)

chat_complete = client.chat.completions.create(
    model="X",
    messages=[{"role": "user", "content": "Just say hi!"}]
)

print(chat_complete.model)
print(chat_complete.choices[0].message.content)
Output
gpt-3.5-turbo-0125
Hi! How can I assist you today?

Observability with Portkey

Route requests through Portkey to track metrics like tokens used, latency, and cost.