OpenRouter TypeScript SDK - v1.0.6

OpenRouter TypeScript SDK

A complete, type-safe TypeScript SDK for the OpenRouter API. Node.js only (ESM), with full API coverage, streaming support, and comprehensive error handling.

Features

✅ Full API Coverage: Chat completions, streaming, models, providers, credits, analytics
✅ Type Safety: Complete TypeScript types for all endpoints and responses
✅ Streaming: Two approaches - ReadableStream (low-level) or AsyncIterable (recommended)
✅ Advanced Features: Tool calling, structured outputs, multimodal (vision), provider preferences
✅ Batch Requests: Execute multiple requests concurrently with rate limiting
✅ Validation Helpers: Pre-validate parameters, check model capabilities, truncate messages
✅ Reliability: Automatic retry with exponential backoff, timeouts, proper error handling
✅ Security: Automatic redaction of sensitive data in logs
✅ Logging: Multiple logger implementations (default, silent, formatted)
✅ 100% Test Coverage: 92 tests covering all features

Installation

npm install @pierreraby/openrouter-client
# or
pnpm add @pierreraby/openrouter-client
# or
yarn add @pierreraby/openrouter-client
yarn add openrouter-client

Quick Start

import OpenRouterClient from 'openrouter-client';

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY
});

// Simple chat completion
const response = await client.createChatCompletion({
  model: 'openai/gpt-3.5-turbo',
  messages: [
    { role: 'user', content: 'Hello!' }
  ]
});

console.log(response.choices[0].message.content);

Streaming (Recommended)

// Using AsyncIterable (cleanest approach)
for await (const chunk of client.streamChatCompletion({
  model: 'openai/gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Tell me a story' }]
})) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Examples

The examples/ directory contains comprehensive examples for all features:

Basic Usage (01-03)

01-basic-usage.ts: Client initialization, simple chat completion
02-streaming.ts: ReadableStream vs AsyncIterable streaming
03-tool-calls.ts: Function calling with helpers

Advanced Features (04-07)

04-structured-outputs.ts: JSON mode and json_schema
05-multimodal.ts: Vision with images (URL, base64, multiple)
06-provider-preferences.ts: Provider routing, fallbacks, quantization
07-cost-tracking.ts: Cost monitoring with getGeneration(), getCredits()

Production Patterns (08-12)

08-error-handling.ts: Robust error handling strategies
09-retry-backoff.ts: Retry configuration and best practices
10-prompt-caching.ts: Anthropic caching for 90% cost reduction
11-model-capabilities.ts: Discover model features and validate compatibility
12-rate-limits.ts: Monitor rate limits, budgets, and usage

Validation & Optimization (13-16)

13-validation-helpers.ts: Parameter validation, feature checking, message truncation
14-batch-requests.ts: Concurrent batch processing with rate limiting
15-tool-message-validation.ts: Tool message formatting and common validation errors
16-immediate-cost-tracking.ts: Immediate cost tracking via response.usage (recommended)

Run examples with:

tsx examples/01-basic-usage.ts

Configuration

const client = new OpenRouterClient({
  apiKey: string;              // Required: Your OpenRouter API key
  baseURL?: string;            // Default: 'https://openrouter.ai/api/v1'
  timeout?: number;            // Default: 30000 (30s)
  maxRetries?: number;         // Default: 3
  retryDelay?: number;         // Default: 1000 (1s initial delay)
  headers?: Record<string, string>; // Additional headers
  logger?: Logger;             // Custom logger
  logLevel?: LogLevel;         // 'error' | 'warn' | 'info' | 'debug'
});

Recommended Configurations

Development:

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  maxRetries: 1,
  logLevel: 'debug'
});

Production:

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  timeout: 60000,
  maxRetries: 5,
  retryDelay: 2000,
  logLevel: 'error'
});

Advanced Features

Tool Calling (Function Calling)

const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'get_weather',
      description: 'Get current weather',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        },
        required: ['location']
      }
    }
  }
];

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: "What's the weather in Paris?" }],
  tools,
  tool_choice: 'auto'
});

// Parse and execute tool calls
if (response.choices[0].message.tool_calls) {
  const parsedCalls = OpenRouterClient.parseToolCalls(
    response.choices[0].message.tool_calls
  );
  
  for (const call of parsedCalls) {
    const result = yourFunctions[call.function.name](call.function.arguments);
    
    const toolMessage = OpenRouterClient.createToolResponseMessage(
      call.id,
      result,
      call.function.name
    );
    messages.push(toolMessage);
  }
}

Structured Outputs (JSON Schema)

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: 'Generate a person profile' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'person_profile',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age: { type: 'number' },
          occupation: { type: 'string' }
        },
        required: ['name', 'age', 'occupation']
      }
    }
  }
});

const person = JSON.parse(response.choices[0].message.content!);

Multimodal (Vision)

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg',
            detail: 'high'
          }
        }
      ]
    }
  ]
});

Cost Tracking

// Get account credits
const credits = await client.getCredits();
console.log(`Remaining: $${credits.total_credits - credits.total_usage}`);

// Track specific generation (⚠️ NOT immediately available - see note below)
const response = await client.createChatCompletion({ /* ... */ });
const stats = await client.getGeneration(response.id);
console.log(`Cost: $${stats.total_cost}`);

// ⚠️ RECOMMENDED: Use response.usage for immediate cost tracking
const response = await client.createChatCompletion({ /* ... */ });
if (response.usage) {
  console.log(`Prompt tokens: ${response.usage.prompt_tokens}`);
  console.log(`Completion tokens: ${response.usage.completion_tokens}`);
  console.log(`Total tokens: ${response.usage.total_tokens}`);
  // Calculate approximate cost based on model pricing
}

// Estimate before request
const messages = [/* ... */];
const estimatedTokens = client.countMessagesTokens(messages);
console.log(`Estimated tokens: ${estimatedTokens}`);

Note: getGeneration() statistics are not immediately available after a request completes. OpenRouter needs time to process them. For real-time cost tracking, use response.usage instead (see example 16).

Prompt Caching (Anthropic)

Reduce costs up to 90% by caching portions of your prompts with Anthropic's Claude models:

// Mark system prompt as cacheable (must be >1024 tokens for Claude 3.5 Sonnet)
const systemPrompt = OpenRouterClient.markMessageAsCacheable({
  role: 'system',
  content: 'Long instructions, examples, or context that will be reused...' // >1024 tokens
});

// First call: cache creation (10% surcharge)
const response1 = await client.createChatCompletion({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [
    systemPrompt,
    { role: 'user', content: 'First question' }
  ],
  usage: { include: true }  // ✅ Get detailed cache metrics
});

// Second call: cache hit (90% discount)
const response2 = await client.createChatCompletion({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [
    systemPrompt,
    { role: 'user', content: 'Second question' }
  ],
  usage: { include: true }
});

// Track cache performance (real-time)
console.log('Cached tokens:', response2.usage?.prompt_tokens_details?.cached_tokens);
// Output: 1668 (90% discount on these tokens!)

// Or track via generation ID (async, more accurate)
const stats = await client.getGeneration(response2.id);
console.log('Cache discount:', stats.cache_discount); // e.g., 0.0045036 ($)
console.log('Native cached tokens:', stats.native_tokens_cached); // e.g., 1668

Two methods to track cache metrics:

Real-time with usage: { include: true } (recommended for development)
- Returns prompt_tokens_details.cached_tokens in response
- Adds ~200ms latency to final response
- Best for debugging and real-time monitoring
Async with getGeneration(id) (recommended for production)
- Returns cache_discount (actual $ savings) and native_tokens_cached
- No latency impact on responses
- Best for cost analytics and reporting

Requirements:

Minimum 1024 tokens for Claude 3.7/3.5 Sonnet and 3 Opus
Minimum 2048 tokens for Claude 3.5/3 Haiku
Cache expires after 5 minutes of inactivity

Best practices:

Cache stable content (system prompts, reference docs, examples)
Don't cache dynamic content (user messages, real-time data)
Use provider-specific models (e.g., anthropic/claude-3.5-sonnet)
See examples/10-prompt-caching.ts for complete examples with both tracking methods

Model Capabilities Discovery

Automatically discover what features a model supports before using it:

const caps = await client.getModelCapabilities('anthropic/claude-3.5-sonnet');

// Check capabilities
if (caps.supportsVision) {
  // Can send images
}
if (caps.supportsTools) {
  // Can use function calling
}
if (caps.supportsJSON) {
  // Can use response_format
}

// Access detailed info
console.log('Context length:', caps.maxContextLength);
console.log('Input modalities:', caps.inputModalities); // ['text', 'image']
console.log('Supported params:', caps.supportedParameters);
console.log('Pricing:', caps.pricing); // { prompt: 0.003, completion: 0.015 }

Use cases:

Validate model compatibility before requests
Build dynamic UIs that adapt to model capabilities
Auto-select the best model for your needs
See examples/11-model-capabilities.ts for advanced patterns

Rate Limits & Usage Monitoring

Track your API usage, budgets, and rate limits in real-time:

// Get detailed key information
const keyInfo = await client.getKeyInfo();
console.log('Usage:', keyInfo.usage);
console.log('Limit:', keyInfo.limit || 'Unlimited');
console.log('Free tier:', keyInfo.is_free_tier);
if (keyInfo.rate_limit) {
  console.log(`${keyInfo.rate_limit.requests} requests per ${keyInfo.rate_limit.interval}`);
}

// Get credits with current rate limit status
const credits = await client.getCredits();
console.log('Credits remaining:', credits.total_credits - credits.total_usage);
if (credits.rate_limit) {
  console.log('Requests remaining:', credits.rate_limit.remaining);
  console.log('Resets at:', new Date(credits.rate_limit.reset * 1000));
}

Benefits:

Prevent 429 errors with proactive throttling
Monitor budget usage in real-time
Set up alerts before hitting limits
See examples/12-rate-limits.ts for monitoring patterns

Validation Helpers

Pre-validate requests before sending them to save costs and avoid errors:

// Check if a model supports a specific feature
const supportsVision = await client.supportsFeature(
  'anthropic/claude-3.5-sonnet',
  'vision'
);

if (!supportsVision) {
  console.log('This model cannot process images');
}

// Validate parameters against model capabilities
const validation = await client.validateParams('openai/gpt-3.5-turbo', {
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
  tools: [/* ... */],
  max_tokens: 5000
});

if (!validation.valid) {
  console.error('Errors:', validation.errors);
  // Example: ["Model doesn't support streaming", "max_tokens exceeds limit"]
}

if (validation.warnings?.length) {
  console.warn('Warnings:', validation.warnings);
  // Example: ["max_tokens is high and may be expensive"]
}

// Truncate conversation to fit context window
const longConversation = [
  { role: 'system', content: 'You are helpful' },
  // ... 50+ messages
];

const truncated = client.truncateMessages(longConversation, 4000);
// Keeps system message + most recent messages that fit in 4000 tokens

Benefits:

Validate before spending credits on invalid requests
Prevent errors for unsupported features
Auto-truncate long conversations (FIFO, preserves system message)
See examples/13-validation-helpers.ts for complete workflows

Batch Requests

Execute multiple chat completion requests concurrently with automatic rate limiting:

// Prepare multiple requests
const requests = [
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to French' }]
  },
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to Spanish' }]
  },
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to German' }]
  }
];

// Execute with concurrency control
const results = await client.batchChatCompletion(requests, {
  maxConcurrent: 5,      // Max 5 concurrent requests (default)
  stopOnError: false     // Continue on errors (default)
});

// Process results
results.forEach((result, idx) => {
  if (result.success && result.response) {
    console.log(`Request ${idx}:`, result.response.choices[0].message.content);
  } else {
    console.error(`Request ${idx} failed:`, result.error?.message);
  }
});

Options:

maxConcurrent: Limit concurrent requests (default: 5)
stopOnError: Stop on first error (default: false)

Benefits:

2-5x faster than sequential requests
Automatic concurrency control
Individual error handling per request
See examples/14-batch-requests.ts for advanced patterns

Error Handling

import { OpenRouterError } from 'openrouter-client';

try {
  const response = await client.createChatCompletion({ /* ... */ });
} catch (error) {
  if (error instanceof OpenRouterError) {
    console.error('OpenRouter Error:', {
      message: error.message,
      status: error.status,
      code: error.code,
      requestId: error.requestId
    });
    
    if (error.status === 429) {
      // Handle rate limit
    } else if (error.status && error.status >= 500) {
      // Handle server error
    }
  }
}

Logging

import { formattedLogger, createLogger, silentLogger } from 'openrouter-client';

// Formatted logger with timestamps and colors
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: formattedLogger,
  logLevel: 'info'
});

// Custom prefixed logger
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: createLogger('MyApp'),
  logLevel: 'debug'
});

// Silent logger (no output)
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: silentLogger
});

API Reference

📚 Complete API Documentation (TypeDoc)

See docs/INDEX.md for architectural decisions and contribution guidelines.

Main Methods

Chat Completions:

createChatCompletion(params) - Standard chat completion
streamChatCompletion(params) - Streaming with AsyncIterable (recommended)
createChatCompletionStream(params) - Streaming with ReadableStream
batchChatCompletion(requests, options?) - Execute multiple requests concurrently

Models & Providers:

listModels() - Get available models
getModel(id) - Get model details
getModelEndpoints(id) - Get model endpoints
getModelCapabilities(id) - Get detailed model capabilities
listProviders() - Get available providers

Account & Usage:

getCredits() - Get account credits (with rate limits)
getKeyInfo() - Get API key information and limits
getActivity() - Get activity analytics
getGeneration(id) - Get generation statistics

Validation & Helpers:

supportsFeature(modelId, feature) - Check if model supports a feature
validateParams(modelId, params) - Validate parameters against model
truncateMessages(messages, maxTokens) - Truncate messages to fit context
countTokens(text) - Estimate tokens in text
countMessagesTokens(messages) - Estimate tokens in messages
validateApiKey() - Validate API key

Static Helpers

OpenRouterClient.parseToolCalls(toolCalls) - Parse tool calls
OpenRouterClient.createToolResponseMessage(id, content, name?) - Create tool response (requires string content)
OpenRouterClient.createToolResponseFromResult(id, result, name?) - Create tool response from any object (auto-serializes)
OpenRouterClient.executeToolCalls(toolCalls, functions) - Execute tool calls
OpenRouterClient.markMessageAsCacheable(message) - Mark message for caching

Development

# Install dependencies
pnpm install

# Run tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Build
pnpm build

# Lint
pnpm lint

# Format
pnpm format

Requirements

Node.js 22.x LTS or later (native fetch support)
TypeScript 5.9.x or later
ESM only (no CommonJS)

License

MIT

Contributing

See docs/INDEX.md for contribution guidelines and architecture decisions.

OpenRouter TypeScript SDK - v1.0.6

OpenRouter TypeScript SDK

Features

Installation

Quick Start

Streaming (Recommended)

Examples

Basic Usage (01-03)

Advanced Features (04-07)

Production Patterns (08-12)

Validation & Optimization (13-16)

Configuration

Recommended Configurations

Advanced Features

Tool Calling (Function Calling)

Structured Outputs (JSON Schema)

Multimodal (Vision)

Cost Tracking

Prompt Caching (Anthropic)

Model Capabilities Discovery

Rate Limits & Usage Monitoring

Validation Helpers

Batch Requests

Error Handling

Logging

API Reference

Main Methods

Static Helpers

Development

Requirements

License

Contributing

Settings

On This Page