The Portkey AI Gateway includes built-in caching to reduce latency and API costs by storing and reusing LLM responses. The caching system supports both simple and semantic caching modes.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/portkey-AI/gateway/llms.txt
Use this file to discover all available pages before exploring further.
How Caching Works
When caching is enabled, the gateway:- Generates a cache key from the request body and URL
- Checks for a cached response before making the provider request
- Returns the cached response if found and not expired
- Makes the provider request if cache miss
- Stores the response in cache for future requests
Cache Configuration
Enable Caching
Caching can be enabled globally or per-request: Global configuration (conf.json):
Cache Modes
Simple Caching
Caches based on exact request match:- Request body and URL are hashed to create a cache key
- Exact match required for cache hit
- TTL (time-to-live) specified by
max_age - Default TTL is 24 hours (86,400,000 ms) if not specified
Semantic Caching
Semantic caching matches requests with similar meaning:Cache Key Generation
The cache key is generated using SHA-256 hashing:src/middlewares/cache/index.ts:14-26
What’s included in the key:
- Complete request body (model, messages, temperature, etc.)
- Request URL (provider endpoint)
- All request parameters
- Request headers (except those in the body)
- Timestamps
- User metadata
Cache Storage
In-Memory Cache
The default implementation uses in-memory storage:src/middlewares/cache/index.ts:3
Characteristics:
- Fast access (no network calls)
- Lost on gateway restart
- Not shared across gateway instances
- Limited by available memory
Redis Cache
Redis caching is available in Node.js runtime when
REDIS_CONNECTION_STRING is set.- Persistent across restarts
- Shared across multiple gateway instances
- Configurable eviction policies
- Support for large cache sizes
src/index.ts:49-51
Cache Lifecycle
Retrieving from Cache
src/middlewares/cache/index.ts:29-58
Storing in Cache
src/middlewares/cache/index.ts:60-82
Cache Status
The gateway tracks cache operations with status codes:src/middlewares/cache/index.ts:5-12
Cache Invalidation
Force Refresh
Bypass cache and fetch fresh response:Expiration-based
Entries automatically expire whenmax_age is reached:
Manual Invalidation
For in-memory cache, entries are removed on:- Gateway restart
- Expiration check during retrieval
- Configure TTL in Redis
- Use Redis commands to flush keys
- Implement custom invalidation logic
Streaming and Caching
src/middlewares/cache/index.ts:70-73
Cache TTL Calculation
The effective TTL is calculated as:src/middlewares/cache/index.ts:107-108
Cache Middleware Integration
The cache middleware is registered in the request pipeline:src/index.ts:108-110
The middleware:
- Runs on all routes (
'*') - Executes before request handlers
- Stores responses after successful completion
Performance Considerations
Cache Hit Rate
Maximize cache effectiveness:- Use consistent request parameters
- Avoid random seeds or IDs in requests
- Group similar requests
- Set appropriate TTLs
Memory Usage
Monitor memory consumption:- Set reasonable
max_agevalues - Use Redis for large caches
- Monitor cache size metrics
- Implement cache size limits
Latency Impact
Cache hit: ~0.1-1ms (in-memory) or ~1-5ms (Redis) Cache miss: Full provider latency + cache write time Cache key generation: ~0.1ms (SHA-256 hash)Use Cases
Repeated Queries
- Frequently asked questions
- Product descriptions
- Knowledge base queries
- Code generation for common patterns
Development/Testing
- Faster test runs
- Reduced API costs during development
- Consistent responses for testing
Short TTL for Dynamic Content
- News summaries
- Real-time data with some staleness tolerance
- Load reduction during traffic spikes
Combining with Other Features
Cache + Fallback
Cache + Load Balancing
Best Practices
Set Appropriate TTLs
Balance freshness vs cost savings based on your use case.
Use Redis for Production
Deploy Redis for persistent, distributed caching at scale.
Monitor Cache Metrics
Track hit rates, memory usage, and cost savings.
Disable for Real-time
Don’t cache time-sensitive or user-specific content.
Next Steps
Configs
Learn how to configure caching in gateway configs.
Routing
Understand how caching interacts with routing.
Load Balancing
Combine caching with load balancing for optimal performance.
Deployment
Set up Redis caching in production deployments.