OpenRouter TypeScript SDK - v1.0.6
    Preparing search index...

    OpenRouter TypeScript SDK - v1.0.6

    OpenRouter TypeScript SDK

    npm version License: MIT TypeScript Node.js Documentation pnpm

    A complete, type-safe TypeScript SDK for the OpenRouter API. Node.js only (ESM), with full API coverage, streaming support, and comprehensive error handling.

    Full API Coverage: Chat completions, streaming, models, providers, credits, analytics
    Type Safety: Complete TypeScript types for all endpoints and responses
    Streaming: Two approaches - ReadableStream (low-level) or AsyncIterable (recommended)
    Advanced Features: Tool calling, structured outputs, multimodal (vision), provider preferences
    Batch Requests: Execute multiple requests concurrently with rate limiting
    Validation Helpers: Pre-validate parameters, check model capabilities, truncate messages
    Reliability: Automatic retry with exponential backoff, timeouts, proper error handling
    Security: Automatic redaction of sensitive data in logs
    Logging: Multiple logger implementations (default, silent, formatted)
    100% Test Coverage: 92 tests covering all features

    npm install @pierreraby/openrouter-client
    # or
    pnpm add @pierreraby/openrouter-client
    # or
    yarn add @pierreraby/openrouter-client
    yarn add openrouter-client
    import OpenRouterClient from 'openrouter-client';

    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY
    });

    // Simple chat completion
    const response = await client.createChatCompletion({
    model: 'openai/gpt-3.5-turbo',
    messages: [
    { role: 'user', content: 'Hello!' }
    ]
    });

    console.log(response.choices[0].message.content);
    // Using AsyncIterable (cleanest approach)
    for await (const chunk of client.streamChatCompletion({
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Tell me a story' }]
    })) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
    process.stdout.write(content);
    }
    }

    The examples/ directory contains comprehensive examples for all features:

    • 01-basic-usage.ts: Client initialization, simple chat completion
    • 02-streaming.ts: ReadableStream vs AsyncIterable streaming
    • 03-tool-calls.ts: Function calling with helpers
    • 04-structured-outputs.ts: JSON mode and json_schema
    • 05-multimodal.ts: Vision with images (URL, base64, multiple)
    • 06-provider-preferences.ts: Provider routing, fallbacks, quantization
    • 07-cost-tracking.ts: Cost monitoring with getGeneration(), getCredits()
    • 08-error-handling.ts: Robust error handling strategies
    • 09-retry-backoff.ts: Retry configuration and best practices
    • 10-prompt-caching.ts: Anthropic caching for 90% cost reduction
    • 11-model-capabilities.ts: Discover model features and validate compatibility
    • 12-rate-limits.ts: Monitor rate limits, budgets, and usage
    • 13-validation-helpers.ts: Parameter validation, feature checking, message truncation
    • 14-batch-requests.ts: Concurrent batch processing with rate limiting
    • 15-tool-message-validation.ts: Tool message formatting and common validation errors
    • 16-immediate-cost-tracking.ts: Immediate cost tracking via response.usage (recommended)

    Run examples with:

    tsx examples/01-basic-usage.ts
    
    const client = new OpenRouterClient({
    apiKey: string; // Required: Your OpenRouter API key
    baseURL?: string; // Default: 'https://openrouter.ai/api/v1'
    timeout?: number; // Default: 30000 (30s)
    maxRetries?: number; // Default: 3
    retryDelay?: number; // Default: 1000 (1s initial delay)
    headers?: Record<string, string>; // Additional headers
    logger?: Logger; // Custom logger
    logLevel?: LogLevel; // 'error' | 'warn' | 'info' | 'debug'
    });

    Development:

    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY!,
    maxRetries: 1,
    logLevel: 'debug'
    });

    Production:

    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY!,
    timeout: 60000,
    maxRetries: 5,
    retryDelay: 2000,
    logLevel: 'error'
    });
    const tools = [
    {
    type: 'function' as const,
    function: {
    name: 'get_weather',
    description: 'Get current weather',
    parameters: {
    type: 'object',
    properties: {
    location: { type: 'string' }
    },
    required: ['location']
    }
    }
    }
    ];

    const response = await client.createChatCompletion({
    model: 'openai/gpt-4o-mini',
    messages: [{ role: 'user', content: "What's the weather in Paris?" }],
    tools,
    tool_choice: 'auto'
    });

    // Parse and execute tool calls
    if (response.choices[0].message.tool_calls) {
    const parsedCalls = OpenRouterClient.parseToolCalls(
    response.choices[0].message.tool_calls
    );

    for (const call of parsedCalls) {
    const result = yourFunctions[call.function.name](call.function.arguments);

    const toolMessage = OpenRouterClient.createToolResponseMessage(
    call.id,
    result,
    call.function.name
    );
    messages.push(toolMessage);
    }
    }
    const response = await client.createChatCompletion({
    model: 'openai/gpt-4o-mini',
    messages: [{ role: 'user', content: 'Generate a person profile' }],
    response_format: {
    type: 'json_schema',
    json_schema: {
    name: 'person_profile',
    strict: true,
    schema: {
    type: 'object',
    properties: {
    name: { type: 'string' },
    age: { type: 'number' },
    occupation: { type: 'string' }
    },
    required: ['name', 'age', 'occupation']
    }
    }
    }
    });

    const person = JSON.parse(response.choices[0].message.content!);
    const response = await client.createChatCompletion({
    model: 'openai/gpt-4o-mini',
    messages: [
    {
    role: 'user',
    content: [
    { type: 'text', text: 'What is in this image?' },
    {
    type: 'image_url',
    image_url: {
    url: 'https://example.com/image.jpg',
    detail: 'high'
    }
    }
    ]
    }
    ]
    });
    // Get account credits
    const credits = await client.getCredits();
    console.log(`Remaining: $${credits.total_credits - credits.total_usage}`);

    // Track specific generation (⚠️ NOT immediately available - see note below)
    const response = await client.createChatCompletion({ /* ... */ });
    const stats = await client.getGeneration(response.id);
    console.log(`Cost: $${stats.total_cost}`);

    // ⚠️ RECOMMENDED: Use response.usage for immediate cost tracking
    const response = await client.createChatCompletion({ /* ... */ });
    if (response.usage) {
    console.log(`Prompt tokens: ${response.usage.prompt_tokens}`);
    console.log(`Completion tokens: ${response.usage.completion_tokens}`);
    console.log(`Total tokens: ${response.usage.total_tokens}`);
    // Calculate approximate cost based on model pricing
    }

    // Estimate before request
    const messages = [/* ... */];
    const estimatedTokens = client.countMessagesTokens(messages);
    console.log(`Estimated tokens: ${estimatedTokens}`);

    Note: getGeneration() statistics are not immediately available after a request completes. OpenRouter needs time to process them. For real-time cost tracking, use response.usage instead (see example 16).

    Reduce costs up to 90% by caching portions of your prompts with Anthropic's Claude models:

    // Mark system prompt as cacheable (must be >1024 tokens for Claude 3.5 Sonnet)
    const systemPrompt = OpenRouterClient.markMessageAsCacheable({
    role: 'system',
    content: 'Long instructions, examples, or context that will be reused...' // >1024 tokens
    });

    // First call: cache creation (10% surcharge)
    const response1 = await client.createChatCompletion({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
    systemPrompt,
    { role: 'user', content: 'First question' }
    ],
    usage: { include: true } // ✅ Get detailed cache metrics
    });

    // Second call: cache hit (90% discount)
    const response2 = await client.createChatCompletion({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
    systemPrompt,
    { role: 'user', content: 'Second question' }
    ],
    usage: { include: true }
    });

    // Track cache performance (real-time)
    console.log('Cached tokens:', response2.usage?.prompt_tokens_details?.cached_tokens);
    // Output: 1668 (90% discount on these tokens!)

    // Or track via generation ID (async, more accurate)
    const stats = await client.getGeneration(response2.id);
    console.log('Cache discount:', stats.cache_discount); // e.g., 0.0045036 ($)
    console.log('Native cached tokens:', stats.native_tokens_cached); // e.g., 1668

    Two methods to track cache metrics:

    1. Real-time with usage: { include: true } (recommended for development)

      • Returns prompt_tokens_details.cached_tokens in response
      • Adds ~200ms latency to final response
      • Best for debugging and real-time monitoring
    2. Async with getGeneration(id) (recommended for production)

      • Returns cache_discount (actual $ savings) and native_tokens_cached
      • No latency impact on responses
      • Best for cost analytics and reporting

    Requirements:

    • Minimum 1024 tokens for Claude 3.7/3.5 Sonnet and 3 Opus
    • Minimum 2048 tokens for Claude 3.5/3 Haiku
    • Cache expires after 5 minutes of inactivity

    Best practices:

    • Cache stable content (system prompts, reference docs, examples)
    • Don't cache dynamic content (user messages, real-time data)
    • Use provider-specific models (e.g., anthropic/claude-3.5-sonnet)
    • See examples/10-prompt-caching.ts for complete examples with both tracking methods

    Automatically discover what features a model supports before using it:

    const caps = await client.getModelCapabilities('anthropic/claude-3.5-sonnet');

    // Check capabilities
    if (caps.supportsVision) {
    // Can send images
    }
    if (caps.supportsTools) {
    // Can use function calling
    }
    if (caps.supportsJSON) {
    // Can use response_format
    }

    // Access detailed info
    console.log('Context length:', caps.maxContextLength);
    console.log('Input modalities:', caps.inputModalities); // ['text', 'image']
    console.log('Supported params:', caps.supportedParameters);
    console.log('Pricing:', caps.pricing); // { prompt: 0.003, completion: 0.015 }

    Use cases:

    • Validate model compatibility before requests
    • Build dynamic UIs that adapt to model capabilities
    • Auto-select the best model for your needs
    • See examples/11-model-capabilities.ts for advanced patterns

    Track your API usage, budgets, and rate limits in real-time:

    // Get detailed key information
    const keyInfo = await client.getKeyInfo();
    console.log('Usage:', keyInfo.usage);
    console.log('Limit:', keyInfo.limit || 'Unlimited');
    console.log('Free tier:', keyInfo.is_free_tier);
    if (keyInfo.rate_limit) {
    console.log(`${keyInfo.rate_limit.requests} requests per ${keyInfo.rate_limit.interval}`);
    }

    // Get credits with current rate limit status
    const credits = await client.getCredits();
    console.log('Credits remaining:', credits.total_credits - credits.total_usage);
    if (credits.rate_limit) {
    console.log('Requests remaining:', credits.rate_limit.remaining);
    console.log('Resets at:', new Date(credits.rate_limit.reset * 1000));
    }

    Benefits:

    • Prevent 429 errors with proactive throttling
    • Monitor budget usage in real-time
    • Set up alerts before hitting limits
    • See examples/12-rate-limits.ts for monitoring patterns

    Pre-validate requests before sending them to save costs and avoid errors:

    // Check if a model supports a specific feature
    const supportsVision = await client.supportsFeature(
    'anthropic/claude-3.5-sonnet',
    'vision'
    );

    if (!supportsVision) {
    console.log('This model cannot process images');
    }

    // Validate parameters against model capabilities
    const validation = await client.validateParams('openai/gpt-3.5-turbo', {
    messages: [{ role: 'user', content: 'Hello' }],
    stream: true,
    tools: [/* ... */],
    max_tokens: 5000
    });

    if (!validation.valid) {
    console.error('Errors:', validation.errors);
    // Example: ["Model doesn't support streaming", "max_tokens exceeds limit"]
    }

    if (validation.warnings?.length) {
    console.warn('Warnings:', validation.warnings);
    // Example: ["max_tokens is high and may be expensive"]
    }

    // Truncate conversation to fit context window
    const longConversation = [
    { role: 'system', content: 'You are helpful' },
    // ... 50+ messages
    ];

    const truncated = client.truncateMessages(longConversation, 4000);
    // Keeps system message + most recent messages that fit in 4000 tokens

    Benefits:

    • Validate before spending credits on invalid requests
    • Prevent errors for unsupported features
    • Auto-truncate long conversations (FIFO, preserves system message)
    • See examples/13-validation-helpers.ts for complete workflows

    Execute multiple chat completion requests concurrently with automatic rate limiting:

    // Prepare multiple requests
    const requests = [
    {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to French' }]
    },
    {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to Spanish' }]
    },
    {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to German' }]
    }
    ];

    // Execute with concurrency control
    const results = await client.batchChatCompletion(requests, {
    maxConcurrent: 5, // Max 5 concurrent requests (default)
    stopOnError: false // Continue on errors (default)
    });

    // Process results
    results.forEach((result, idx) => {
    if (result.success && result.response) {
    console.log(`Request ${idx}:`, result.response.choices[0].message.content);
    } else {
    console.error(`Request ${idx} failed:`, result.error?.message);
    }
    });

    Options:

    • maxConcurrent: Limit concurrent requests (default: 5)
    • stopOnError: Stop on first error (default: false)

    Benefits:

    • 2-5x faster than sequential requests
    • Automatic concurrency control
    • Individual error handling per request
    • See examples/14-batch-requests.ts for advanced patterns
    import { OpenRouterError } from 'openrouter-client';

    try {
    const response = await client.createChatCompletion({ /* ... */ });
    } catch (error) {
    if (error instanceof OpenRouterError) {
    console.error('OpenRouter Error:', {
    message: error.message,
    status: error.status,
    code: error.code,
    requestId: error.requestId
    });

    if (error.status === 429) {
    // Handle rate limit
    } else if (error.status && error.status >= 500) {
    // Handle server error
    }
    }
    }
    import { formattedLogger, createLogger, silentLogger } from 'openrouter-client';

    // Formatted logger with timestamps and colors
    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY!,
    logger: formattedLogger,
    logLevel: 'info'
    });

    // Custom prefixed logger
    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY!,
    logger: createLogger('MyApp'),
    logLevel: 'debug'
    });

    // Silent logger (no output)
    const client = new OpenRouterClient({
    apiKey: process.env.OPENROUTER_API_KEY!,
    logger: silentLogger
    });

    📚 Complete API Documentation (TypeDoc)

    See docs/INDEX.md for architectural decisions and contribution guidelines.

    Chat Completions:

    • createChatCompletion(params) - Standard chat completion
    • streamChatCompletion(params) - Streaming with AsyncIterable (recommended)
    • createChatCompletionStream(params) - Streaming with ReadableStream
    • batchChatCompletion(requests, options?) - Execute multiple requests concurrently

    Models & Providers:

    • listModels() - Get available models
    • getModel(id) - Get model details
    • getModelEndpoints(id) - Get model endpoints
    • getModelCapabilities(id) - Get detailed model capabilities
    • listProviders() - Get available providers

    Account & Usage:

    • getCredits() - Get account credits (with rate limits)
    • getKeyInfo() - Get API key information and limits
    • getActivity() - Get activity analytics
    • getGeneration(id) - Get generation statistics

    Validation & Helpers:

    • supportsFeature(modelId, feature) - Check if model supports a feature
    • validateParams(modelId, params) - Validate parameters against model
    • truncateMessages(messages, maxTokens) - Truncate messages to fit context
    • countTokens(text) - Estimate tokens in text
    • countMessagesTokens(messages) - Estimate tokens in messages
    • validateApiKey() - Validate API key
    • OpenRouterClient.parseToolCalls(toolCalls) - Parse tool calls
    • OpenRouterClient.createToolResponseMessage(id, content, name?) - Create tool response (requires string content)
    • OpenRouterClient.createToolResponseFromResult(id, result, name?) - Create tool response from any object (auto-serializes)
    • OpenRouterClient.executeToolCalls(toolCalls, functions) - Execute tool calls
    • OpenRouterClient.markMessageAsCacheable(message) - Mark message for caching
    # Install dependencies
    pnpm install

    # Run tests
    pnpm test

    # Run tests in watch mode
    pnpm test:watch

    # Build
    pnpm build

    # Lint
    pnpm lint

    # Format
    pnpm format
    • Node.js 22.x LTS or later (native fetch support)
    • TypeScript 5.9.x or later
    • ESM only (no CommonJS)

    MIT

    See docs/INDEX.md for contribution guidelines and architecture decisions.