A complete, type-safe TypeScript SDK for the OpenRouter API. Node.js only (ESM), with full API coverage, streaming support, and comprehensive error handling.
✅ Full API Coverage: Chat completions, streaming, models, providers, credits, analytics
✅ Type Safety: Complete TypeScript types for all endpoints and responses
✅ Streaming: Two approaches - ReadableStream (low-level) or AsyncIterable (recommended)
✅ Advanced Features: Tool calling, structured outputs, multimodal (vision), provider preferences
✅ Batch Requests: Execute multiple requests concurrently with rate limiting
✅ Validation Helpers: Pre-validate parameters, check model capabilities, truncate messages
✅ Reliability: Automatic retry with exponential backoff, timeouts, proper error handling
✅ Security: Automatic redaction of sensitive data in logs
✅ Logging: Multiple logger implementations (default, silent, formatted)
✅ 100% Test Coverage: 92 tests covering all features
npm install @pierreraby/openrouter-client
# or
pnpm add @pierreraby/openrouter-client
# or
yarn add @pierreraby/openrouter-client
yarn add openrouter-client
import OpenRouterClient from 'openrouter-client';
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY
});
// Simple chat completion
const response = await client.createChatCompletion({
model: 'openai/gpt-3.5-turbo',
messages: [
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);
// Using AsyncIterable (cleanest approach)
for await (const chunk of client.streamChatCompletion({
model: 'openai/gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Tell me a story' }]
})) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
The examples/
directory contains comprehensive examples for all features:
Run examples with:
tsx examples/01-basic-usage.ts
const client = new OpenRouterClient({
apiKey: string; // Required: Your OpenRouter API key
baseURL?: string; // Default: 'https://openrouter.ai/api/v1'
timeout?: number; // Default: 30000 (30s)
maxRetries?: number; // Default: 3
retryDelay?: number; // Default: 1000 (1s initial delay)
headers?: Record<string, string>; // Additional headers
logger?: Logger; // Custom logger
logLevel?: LogLevel; // 'error' | 'warn' | 'info' | 'debug'
});
Development:
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY!,
maxRetries: 1,
logLevel: 'debug'
});
Production:
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY!,
timeout: 60000,
maxRetries: 5,
retryDelay: 2000,
logLevel: 'error'
});
const tools = [
{
type: 'function' as const,
function: {
name: 'get_weather',
description: 'Get current weather',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
},
required: ['location']
}
}
}
];
const response = await client.createChatCompletion({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: "What's the weather in Paris?" }],
tools,
tool_choice: 'auto'
});
// Parse and execute tool calls
if (response.choices[0].message.tool_calls) {
const parsedCalls = OpenRouterClient.parseToolCalls(
response.choices[0].message.tool_calls
);
for (const call of parsedCalls) {
const result = yourFunctions[call.function.name](call.function.arguments);
const toolMessage = OpenRouterClient.createToolResponseMessage(
call.id,
result,
call.function.name
);
messages.push(toolMessage);
}
}
const response = await client.createChatCompletion({
model: 'openai/gpt-4o-mini',
messages: [{ role: 'user', content: 'Generate a person profile' }],
response_format: {
type: 'json_schema',
json_schema: {
name: 'person_profile',
strict: true,
schema: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
occupation: { type: 'string' }
},
required: ['name', 'age', 'occupation']
}
}
}
});
const person = JSON.parse(response.choices[0].message.content!);
const response = await client.createChatCompletion({
model: 'openai/gpt-4o-mini',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image_url',
image_url: {
url: 'https://example.com/image.jpg',
detail: 'high'
}
}
]
}
]
});
// Get account credits
const credits = await client.getCredits();
console.log(`Remaining: $${credits.total_credits - credits.total_usage}`);
// Track specific generation (⚠️ NOT immediately available - see note below)
const response = await client.createChatCompletion({ /* ... */ });
const stats = await client.getGeneration(response.id);
console.log(`Cost: $${stats.total_cost}`);
// ⚠️ RECOMMENDED: Use response.usage for immediate cost tracking
const response = await client.createChatCompletion({ /* ... */ });
if (response.usage) {
console.log(`Prompt tokens: ${response.usage.prompt_tokens}`);
console.log(`Completion tokens: ${response.usage.completion_tokens}`);
console.log(`Total tokens: ${response.usage.total_tokens}`);
// Calculate approximate cost based on model pricing
}
// Estimate before request
const messages = [/* ... */];
const estimatedTokens = client.countMessagesTokens(messages);
console.log(`Estimated tokens: ${estimatedTokens}`);
Note: getGeneration()
statistics are not immediately available after a request completes. OpenRouter needs time to process them. For real-time cost tracking, use response.usage
instead (see example 16).
Reduce costs up to 90% by caching portions of your prompts with Anthropic's Claude models:
// Mark system prompt as cacheable (must be >1024 tokens for Claude 3.5 Sonnet)
const systemPrompt = OpenRouterClient.markMessageAsCacheable({
role: 'system',
content: 'Long instructions, examples, or context that will be reused...' // >1024 tokens
});
// First call: cache creation (10% surcharge)
const response1 = await client.createChatCompletion({
model: 'anthropic/claude-3.5-sonnet',
messages: [
systemPrompt,
{ role: 'user', content: 'First question' }
],
usage: { include: true } // ✅ Get detailed cache metrics
});
// Second call: cache hit (90% discount)
const response2 = await client.createChatCompletion({
model: 'anthropic/claude-3.5-sonnet',
messages: [
systemPrompt,
{ role: 'user', content: 'Second question' }
],
usage: { include: true }
});
// Track cache performance (real-time)
console.log('Cached tokens:', response2.usage?.prompt_tokens_details?.cached_tokens);
// Output: 1668 (90% discount on these tokens!)
// Or track via generation ID (async, more accurate)
const stats = await client.getGeneration(response2.id);
console.log('Cache discount:', stats.cache_discount); // e.g., 0.0045036 ($)
console.log('Native cached tokens:', stats.native_tokens_cached); // e.g., 1668
Two methods to track cache metrics:
Real-time with usage: { include: true }
(recommended for development)
prompt_tokens_details.cached_tokens
in responseAsync with getGeneration(id)
(recommended for production)
cache_discount
(actual $ savings) and native_tokens_cached
Requirements:
Best practices:
anthropic/claude-3.5-sonnet
)examples/10-prompt-caching.ts
for complete examples with both tracking methodsAutomatically discover what features a model supports before using it:
const caps = await client.getModelCapabilities('anthropic/claude-3.5-sonnet');
// Check capabilities
if (caps.supportsVision) {
// Can send images
}
if (caps.supportsTools) {
// Can use function calling
}
if (caps.supportsJSON) {
// Can use response_format
}
// Access detailed info
console.log('Context length:', caps.maxContextLength);
console.log('Input modalities:', caps.inputModalities); // ['text', 'image']
console.log('Supported params:', caps.supportedParameters);
console.log('Pricing:', caps.pricing); // { prompt: 0.003, completion: 0.015 }
Use cases:
examples/11-model-capabilities.ts
for advanced patternsTrack your API usage, budgets, and rate limits in real-time:
// Get detailed key information
const keyInfo = await client.getKeyInfo();
console.log('Usage:', keyInfo.usage);
console.log('Limit:', keyInfo.limit || 'Unlimited');
console.log('Free tier:', keyInfo.is_free_tier);
if (keyInfo.rate_limit) {
console.log(`${keyInfo.rate_limit.requests} requests per ${keyInfo.rate_limit.interval}`);
}
// Get credits with current rate limit status
const credits = await client.getCredits();
console.log('Credits remaining:', credits.total_credits - credits.total_usage);
if (credits.rate_limit) {
console.log('Requests remaining:', credits.rate_limit.remaining);
console.log('Resets at:', new Date(credits.rate_limit.reset * 1000));
}
Benefits:
examples/12-rate-limits.ts
for monitoring patternsPre-validate requests before sending them to save costs and avoid errors:
// Check if a model supports a specific feature
const supportsVision = await client.supportsFeature(
'anthropic/claude-3.5-sonnet',
'vision'
);
if (!supportsVision) {
console.log('This model cannot process images');
}
// Validate parameters against model capabilities
const validation = await client.validateParams('openai/gpt-3.5-turbo', {
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
tools: [/* ... */],
max_tokens: 5000
});
if (!validation.valid) {
console.error('Errors:', validation.errors);
// Example: ["Model doesn't support streaming", "max_tokens exceeds limit"]
}
if (validation.warnings?.length) {
console.warn('Warnings:', validation.warnings);
// Example: ["max_tokens is high and may be expensive"]
}
// Truncate conversation to fit context window
const longConversation = [
{ role: 'system', content: 'You are helpful' },
// ... 50+ messages
];
const truncated = client.truncateMessages(longConversation, 4000);
// Keeps system message + most recent messages that fit in 4000 tokens
Benefits:
examples/13-validation-helpers.ts
for complete workflowsExecute multiple chat completion requests concurrently with automatic rate limiting:
// Prepare multiple requests
const requests = [
{
model: 'openai/gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Translate "hello" to French' }]
},
{
model: 'openai/gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Translate "hello" to Spanish' }]
},
{
model: 'openai/gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Translate "hello" to German' }]
}
];
// Execute with concurrency control
const results = await client.batchChatCompletion(requests, {
maxConcurrent: 5, // Max 5 concurrent requests (default)
stopOnError: false // Continue on errors (default)
});
// Process results
results.forEach((result, idx) => {
if (result.success && result.response) {
console.log(`Request ${idx}:`, result.response.choices[0].message.content);
} else {
console.error(`Request ${idx} failed:`, result.error?.message);
}
});
Options:
maxConcurrent
: Limit concurrent requests (default: 5)stopOnError
: Stop on first error (default: false)Benefits:
examples/14-batch-requests.ts
for advanced patternsimport { OpenRouterError } from 'openrouter-client';
try {
const response = await client.createChatCompletion({ /* ... */ });
} catch (error) {
if (error instanceof OpenRouterError) {
console.error('OpenRouter Error:', {
message: error.message,
status: error.status,
code: error.code,
requestId: error.requestId
});
if (error.status === 429) {
// Handle rate limit
} else if (error.status && error.status >= 500) {
// Handle server error
}
}
}
import { formattedLogger, createLogger, silentLogger } from 'openrouter-client';
// Formatted logger with timestamps and colors
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY!,
logger: formattedLogger,
logLevel: 'info'
});
// Custom prefixed logger
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY!,
logger: createLogger('MyApp'),
logLevel: 'debug'
});
// Silent logger (no output)
const client = new OpenRouterClient({
apiKey: process.env.OPENROUTER_API_KEY!,
logger: silentLogger
});
📚 Complete API Documentation (TypeDoc)
See docs/INDEX.md for architectural decisions and contribution guidelines.
Chat Completions:
createChatCompletion(params)
- Standard chat completionstreamChatCompletion(params)
- Streaming with AsyncIterable (recommended)createChatCompletionStream(params)
- Streaming with ReadableStreambatchChatCompletion(requests, options?)
- Execute multiple requests concurrentlyModels & Providers:
listModels()
- Get available modelsgetModel(id)
- Get model detailsgetModelEndpoints(id)
- Get model endpointsgetModelCapabilities(id)
- Get detailed model capabilitieslistProviders()
- Get available providersAccount & Usage:
getCredits()
- Get account credits (with rate limits)getKeyInfo()
- Get API key information and limitsgetActivity()
- Get activity analyticsgetGeneration(id)
- Get generation statisticsValidation & Helpers:
supportsFeature(modelId, feature)
- Check if model supports a featurevalidateParams(modelId, params)
- Validate parameters against modeltruncateMessages(messages, maxTokens)
- Truncate messages to fit contextcountTokens(text)
- Estimate tokens in textcountMessagesTokens(messages)
- Estimate tokens in messagesvalidateApiKey()
- Validate API keyOpenRouterClient.parseToolCalls(toolCalls)
- Parse tool callsOpenRouterClient.createToolResponseMessage(id, content, name?)
- Create tool response (requires string content)OpenRouterClient.createToolResponseFromResult(id, result, name?)
- Create tool response from any object (auto-serializes)OpenRouterClient.executeToolCalls(toolCalls, functions)
- Execute tool callsOpenRouterClient.markMessageAsCacheable(message)
- Mark message for caching# Install dependencies
pnpm install
# Run tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Build
pnpm build
# Lint
pnpm lint
# Format
pnpm format
MIT
See docs/INDEX.md for contribution guidelines and architecture decisions.