Skip to content

OpenAI Compatible API

NovelAI provides OpenAI-compatible API endpoints, allowing you to interact with NovelAI's text generation services using the standard OpenAI API format. This reduces migration costs from OpenAI and improves interoperability with existing tools.

Note

The OpenAI-compatible API uses different models than the native text generation API:

  • OpenAI Compatible API: Uses GLM series models (glm-4-5, glm-4-6)
  • Native Text Generation API: Uses NovelAI proprietary models (llama-3-erato-v1, kayra-v1, clio-v1)

GLM models are general-purpose chat models, while Erato/Kayra/Clio are models specifically optimized for story writing.

Available Models

Get currently available models via listModels():

ModelDescription
glm-4-6GLM-4 Version 6 (Recommended)
glm-4-5GLM-4 Version 5
typescript
const models = await client.openai.listModels();
// Returns: [{ id: 'glm-4-5', owned_by: 'novelai' }, { id: 'glm-4-6', owned_by: 'novelai' }]

Endpoint Overview

EndpointMethodFull URLDescription
/oa/v1/completionsPOSThttps://text.novelai.net/oa/v1/completionsText completion
/oa/v1/chat/completionsPOSThttps://text.novelai.net/oa/v1/chat/completionsChat completion
/oa/v1/modelsGEThttps://text.novelai.net/oa/v1/modelsList available models
/oa/v1/internal/token-countPOSThttps://text.novelai.net/oa/v1/internal/token-countToken count

Basic Usage

Text Completion

typescript
import { NovelAI } from 'novelai-sdk-unofficial';

const client = new NovelAI({ apiKey: 'your-api-key' });

// First get available models
const models = await client.openai.listModels();
const model = models[0].id; // e.g. 'glm-4-6'

const response = await client.openai.completion({
  prompt: 'Once upon a time, in a kingdom far away,',
  model,
  maxTokens: 100,
  temperature: 0.7,
});

console.log(response.choices[0].text);

Chat Completion

typescript
const response = await client.openai.chatCompletion({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello! How are you?' },
  ],
  model: 'glm-4-6',
  maxTokens: 100,
});

console.log(response.choices[0].message.content);

Streaming Responses

Text Completion Streaming

typescript
const stream = client.openai.completionStream({
  prompt: 'The quick brown fox',
  model: 'glm-4-6',
  maxTokens: 100,
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.text;
  if (text) process.stdout.write(text);
}

Chat Completion Streaming

typescript
const stream = client.openai.chatCompletionStream({
  messages: [{ role: 'user', content: 'Tell me a story' }],
  model: 'glm-4-6',
  maxTokens: 200,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Cancelling Streaming Requests

typescript
const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

try {
  for await (const chunk of client.openai.chatCompletionStream({
    messages: [{ role: 'user', content: 'Write a long story' }],
  }, controller.signal)) {
    process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
  }
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('\nGeneration cancelled');
  }
}

List Models

typescript
const models = await client.openai.listModels();

for (const model of models) {
  console.log(`${model.id} (owned by ${model.owned_by})`);
}

Token Count

typescript
const count = await client.openai.tokenCount({
  prompt: 'Hello, world! This is a test.',
  model: 'llama-3-erato-v1',
});

console.log(`Token count: ${count}`);

Parameters

Completion Parameters

ParameterTypeDefaultDescription
promptstringrequiredInput prompt text
modelstringrequiredModel to use (e.g. glm-4-6)
maxTokensnumber-Maximum tokens to generate (1-2048)
temperaturenumber-Sampling temperature (0-2)
topPnumber-Nucleus sampling threshold (0-1)
topKnumber-Top-K sampling
minPnumber-Min-P sampling threshold (0-1)
stopstring | string[]-Stop sequences
streambooleanfalseEnable streaming response
nnumber-Number of completions to generate
frequencyPenaltynumber-Frequency penalty (-2.0 to 2.0)
presencePenaltynumber-Presence penalty (-2.0 to 2.0)
seednumber-Random seed

Chat Completion Parameters

ParameterTypeDefaultDescription
messagesOAIChatMessage[]requiredArray of chat messages
modelstringrequiredModel to use (e.g. glm-4-6)
maxTokensnumber-Maximum tokens to generate (1-2048)
temperaturenumber-Sampling temperature (0-2)
topPnumber-Nucleus sampling threshold (0-1)
topKnumber-Top-K sampling
minPnumber-Min-P sampling threshold (0-1)
stopstring | string[]-Stop sequences
streambooleanfalseEnable streaming response
enableThinkingboolean-Enable thinking/reasoning mode

Message Format

typescript
interface OAIChatMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
  name?: string;  // Optional author name
}

Unified Sampling Parameters

NovelAI supports additional unified sampling parameters:

ParameterTypeDescription
unifiedLinearnumberUnified linear sampling parameter
unifiedQuadraticnumberUnified quadratic sampling parameter
unifiedCubicnumberUnified cubic sampling parameter
unifiedIncreaseLinearWithEntropynumberUnified entropy-increase linear parameter

Comparison with Native API

FeatureOpenAI Compatible APINative API
Interface StyleOpenAI standard formatNovelAI native format
Available ModelsGLM series (glm-4-5, glm-4-6)Erato/Kayra/Clio
Model CharacteristicsGeneral-purpose chat modelsStory writing specialized models
Chat Support✅ Built-in❌ Manual construction
Migration CostLowHigh
Parameter NamingcamelCasesnake_case
Streaming FormatSSE (data: JSON)Native stream

When to Use OpenAI Compatible API

  • Migrating existing code from OpenAI
  • Integrating with OpenAI-compatible tools
  • Building general chat applications
  • Need standardized API format

When to Use Native API

  • Need NovelAI's proprietary story writing models (Erato/Kayra/Clio)
  • Need finer parameter control
  • Already have NovelAI native code
  • Need best story generation quality

Response Format

Completion Response

typescript
interface OAICompletionResponse {
  id: string;
  object: 'text_completion';
  created: number;
  model: string;
  choices: Array<{
    text: string;
    index: number;
    finish_reason: 'stop' | 'length' | null;
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Chat Completion Response

typescript
interface OAIChatCompletionResponse {
  id: string;
  object: 'chat.completion';
  created: number;
  model: string;
  choices: Array<{
    index: number;
    message: {
      role: 'assistant';
      content: string;
    };
    finish_reason: 'stop' | 'length' | null;
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Error Handling

typescript
import { 
  NovelAI, 
  AuthenticationError, 
  InvalidRequestError,
  RateLimitError 
} from 'novelai-sdk-unofficial';

try {
  const response = await client.openai.chatCompletion({
    messages: [{ role: 'user', content: 'Hello' }],
  });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Invalid API Key');
  } else if (error instanceof InvalidRequestError) {
    console.error('Invalid request parameters:', error.message);
  } else if (error instanceof RateLimitError) {
    console.error('Rate limit exceeded');
  }
}

Released under the MIT License.