Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/portkey-AI/gateway/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Cohere provides enterprise-grade language models specialized for business applications, including powerful chat models, best-in-class embeddings, and reranking capabilities. Access Cohere through Portkey for production-ready NLP. Base URL: https://api.cohere.ai

Supported Features

  • ✅ Chat Completions (v2 API)
  • ✅ Streaming
  • ✅ Embeddings
  • ✅ Rerank (via Cohere API)
  • ✅ Tool Use (Function Calling)
  • ✅ Document Mode (RAG)
  • ✅ Citation Mode
  • ✅ Batch Embeddings
  • ❌ Image Generation
  • ❌ Vision

Quick Start

Chat Completions

from portkey_ai import Portkey

client = Portkey(
    provider="cohere",
    Authorization="***"  # Your Cohere API key
)

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[
        {"role": "user", "content": "Explain RAG in simple terms"}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "Write a haiku about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Available Models

Chat Models

ModelContextDescriptionBest For
command-r-plus-08-2024128KMost capableComplex tasks, RAG
command-r-08-2024128KEfficientGeneral purpose
command-r-plus128KPrevious generationLegacy apps
command-r128KPrevious generationLegacy apps
command4KLegacy modelSimple tasks
command-light4KLightweightFast responses

Embedding Models

ModelDimensionsDescription
embed-english-v3.01024English embeddings
embed-multilingual-v3.01024100+ languages
embed-english-light-v3.0384Compact English
embed-multilingual-light-v3.0384Compact multilingual
embed-english-v2.04096Legacy
Cohere excels at:
  • Enterprise deployments with strong support
  • RAG applications with citation support
  • Multilingual tasks (100+ languages)
  • Semantic search with best-in-class embeddings
  • Document grounding for factual responses

Configuration Options

Headers

client = Portkey(
    provider="cohere",
    Authorization="***"  # Bearer token format: "Bearer co-***" or just "co-***"
)
HeaderDescriptionRequired
AuthorizationCohere API key (Bearer token)Yes

Advanced Features

Tool Use (Function Calling)

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Search for products in the catalog",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "category": {
                        "type": "string",
                        "description": "Product category"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "Find laptops under $1000"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

RAG with Document Grounding

Cohere excels at RAG with built-in citation support:
# Documents to ground the response
documents = [
    {
        "id": "doc1",
        "text": "Portkey is an AI Gateway that routes to 250+ LLMs."
    },
    {
        "id": "doc2",
        "text": "The gateway provides fallbacks, load balancing, and caching."
    }
]

response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "What features does Portkey offer?"}],
    # Pass documents via additional parameters
    documents=documents,
    citation_quality="accurate"
)

print(response.choices[0].message.content)

# Access citations if available
if hasattr(response.choices[0].message, 'citations'):
    print("Citations:", response.choices[0].message.citations)

Embeddings

response = client.embeddings.create(
    model="embed-english-v3.0",
    input="Cohere provides enterprise-grade NLP",
    input_type="search_document"  # or "search_query", "classification", "clustering"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
Batch embeddings:
response = client.embeddings.create(
    model="embed-english-v3.0",
    input=[
        "First document",
        "Second document",
        "Third document"
    ],
    input_type="search_document"
)

for i, item in enumerate(response.data):
    print(f"Document {i}: {len(item.embedding)} dimensions")

Embedding Input Types

Optimize embeddings for your use case:
Input TypeUse Case
search_documentIndexing documents for search
search_querySearch queries
classificationText classification
clusteringDocument clustering

Legacy Completions API

For older command models:
response = client.completions.create(
    model="command",
    prompt="Write a tagline for an AI gateway:",
    max_tokens=50
)

print(response.choices[0].text)

Fallback Configuration

Fallback to GPT-4 if Cohere fails:
config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-plus-08-2024"}
        },
        {
            "provider": "openai",
            "api_key": "sk-***",
            "override_params": {"model": "gpt-4o"}
        }
    ]
}

client = Portkey().with_options(config=config)

Load Balancing

Balance between Command R+ and Command R:
config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-plus-08-2024"},
            "weight": 0.3
        },
        {
            "provider": "cohere",
            "api_key": "co-***",
            "override_params": {"model": "command-r-08-2024"},
            "weight": 0.7
        }
    ]
}

client = Portkey().with_options(config=config)

Error Handling

from portkey_ai.exceptions import (
    RateLimitError,
    APIError,
    AuthenticationError
)

try:
    response = client.chat.completions.create(
        model="command-r-plus-08-2024",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except APIError as e:
    print(f"API error: {e}")

Best Practices

  1. Use RAG mode - Leverage document grounding for factual accuracy
  2. Enable citations - Track sources for enterprise use
  3. Choose right embedding type - Use appropriate input_type for embeddings
  4. Use Command R+ - For complex tasks requiring reasoning
  5. Use Command R - For cost-effective general purpose tasks
  6. Batch embeddings - More efficient than individual requests
  7. Implement streaming - Better UX for long responses
  8. Handle tool calls - Multi-step reasoning with function calling

Enterprise Features

  • Data privacy: Cohere doesn’t train on customer data
  • Regional deployment: Available in multiple regions
  • SOC 2 Type II: Enterprise compliance
  • Custom deployments: Private cloud options
  • SLA support: Enterprise support plans
  • Fine-tuning: Custom model training

Pricing

Cohere offers competitive pricing with a free trial:

Cohere Pricing

View detailed pricing for all Cohere models

Embeddings Guide

Working with embeddings

RAG Guide

Building RAG applications

Function Calling

Tool use and function calling

Fallbacks

Fallback configurations