OpenAI Compatible API

NovelAI provides OpenAI-compatible API endpoints, allowing you to interact with NovelAI's text generation services using the standard OpenAI API format. This reduces migration costs from OpenAI and improves interoperability with existing tools.

Note

The OpenAI-compatible API uses different models than the native text generation API:

OpenAI Compatible API: Uses GLM series models (glm-4-5, glm-4-6)
Native Text Generation API: Uses NovelAI proprietary models (llama-3-erato-v1, kayra-v1, clio-v1)

GLM models are general-purpose chat models, while Erato/Kayra/Clio are models specifically optimized for story writing.

Available Models

Get currently available models via listModels():

Model	Description
`glm-4-6`	GLM-4 Version 6 (Recommended)
`glm-4-5`	GLM-4 Version 5

typescript

const models = await client.openai.listModels();
// Returns: [{ id: 'glm-4-5', owned_by: 'novelai' }, { id: 'glm-4-6', owned_by: 'novelai' }]

Endpoint Overview

Endpoint	Method	Full URL	Description
`/oa/v1/completions`	POST	`https://text.novelai.net/oa/v1/completions`	Text completion
`/oa/v1/chat/completions`	POST	`https://text.novelai.net/oa/v1/chat/completions`	Chat completion
`/oa/v1/models`	GET	`https://text.novelai.net/oa/v1/models`	List available models
`/oa/v1/internal/token-count`	POST	`https://text.novelai.net/oa/v1/internal/token-count`	Token count

Basic Usage

Text Completion

typescript

import { NovelAI } from 'novelai-sdk-unofficial';

const client = new NovelAI({ apiKey: 'your-api-key' });

// First get available models
const models = await client.openai.listModels();
const model = models[0].id; // e.g. 'glm-4-6'

const response = await client.openai.completion({
  prompt: 'Once upon a time, in a kingdom far away,',
  model,
  maxTokens: 100,
  temperature: 0.7,
});

console.log(response.choices[0].text);

Chat Completion

typescript

const response = await client.openai.chatCompletion({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello! How are you?' },
  ],
  model: 'glm-4-6',
  maxTokens: 100,
});

console.log(response.choices[0].message.content);

Streaming Responses

Text Completion Streaming

typescript

const stream = client.openai.completionStream({
  prompt: 'The quick brown fox',
  model: 'glm-4-6',
  maxTokens: 100,
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.text;
  if (text) process.stdout.write(text);
}

Chat Completion Streaming

typescript

const stream = client.openai.chatCompletionStream({
  messages: [{ role: 'user', content: 'Tell me a story' }],
  model: 'glm-4-6',
  maxTokens: 200,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Cancelling Streaming Requests

typescript

const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

try {
  for await (const chunk of client.openai.chatCompletionStream({
    messages: [{ role: 'user', content: 'Write a long story' }],
  }, controller.signal)) {
    process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
  }
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('\nGeneration cancelled');
  }
}

List Models

typescript

const models = await client.openai.listModels();

for (const model of models) {
  console.log(`${model.id} (owned by ${model.owned_by})`);
}

Token Count

typescript

const count = await client.openai.tokenCount({
  prompt: 'Hello, world! This is a test.',
  model: 'llama-3-erato-v1',
});

console.log(`Token count: ${count}`);

Parameters

Completion Parameters

Parameter	Type	Default	Description
`prompt`	`string`	required	Input prompt text
`model`	`string`	required	Model to use (e.g. `glm-4-6`)
`maxTokens`	`number`	-	Maximum tokens to generate (1-2048)
`temperature`	`number`	-	Sampling temperature (0-2)
`topP`	`number`	-	Nucleus sampling threshold (0-1)
`topK`	`number`	-	Top-K sampling
`minP`	`number`	-	Min-P sampling threshold (0-1)
`stop`	`string \| string[]`	-	Stop sequences
`stream`	`boolean`	`false`	Enable streaming response
`n`	`number`	-	Number of completions to generate
`frequencyPenalty`	`number`	-	Frequency penalty (-2.0 to 2.0)
`presencePenalty`	`number`	-	Presence penalty (-2.0 to 2.0)
`seed`	`number`	-	Random seed

Chat Completion Parameters

Parameter	Type	Default	Description
`messages`	`OAIChatMessage[]`	required	Array of chat messages
`model`	`string`	required	Model to use (e.g. `glm-4-6`)
`maxTokens`	`number`	-	Maximum tokens to generate (1-2048)
`temperature`	`number`	-	Sampling temperature (0-2)
`topP`	`number`	-	Nucleus sampling threshold (0-1)
`topK`	`number`	-	Top-K sampling
`minP`	`number`	-	Min-P sampling threshold (0-1)
`stop`	`string \| string[]`	-	Stop sequences
`stream`	`boolean`	`false`	Enable streaming response
`enableThinking`	`boolean`	-	Enable thinking/reasoning mode

Message Format

typescript

interface OAIChatMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
  name?: string;  // Optional author name
}

Unified Sampling Parameters

NovelAI supports additional unified sampling parameters:

Parameter	Type	Description
`unifiedLinear`	`number`	Unified linear sampling parameter
`unifiedQuadratic`	`number`	Unified quadratic sampling parameter
`unifiedCubic`	`number`	Unified cubic sampling parameter
`unifiedIncreaseLinearWithEntropy`	`number`	Unified entropy-increase linear parameter

Comparison with Native API

Feature	OpenAI Compatible API	Native API
Interface Style	OpenAI standard format	NovelAI native format
Available Models	GLM series (`glm-4-5`, `glm-4-6`)	Erato/Kayra/Clio
Model Characteristics	General-purpose chat models	Story writing specialized models
Chat Support	✅ Built-in	❌ Manual construction
Migration Cost	Low	High
Parameter Naming	camelCase	snake_case
Streaming Format	SSE (data: JSON)	Native stream

When to Use OpenAI Compatible API

Migrating existing code from OpenAI
Integrating with OpenAI-compatible tools
Building general chat applications
Need standardized API format

When to Use Native API

Need NovelAI's proprietary story writing models (Erato/Kayra/Clio)
Need finer parameter control
Already have NovelAI native code
Need best story generation quality

Response Format

Completion Response

typescript

interface OAICompletionResponse {
  id: string;
  object: 'text_completion';
  created: number;
  model: string;
  choices: Array<{
    text: string;
    index: number;
    finish_reason: 'stop' | 'length' | null;
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Chat Completion Response

typescript

interface OAIChatCompletionResponse {
  id: string;
  object: 'chat.completion';
  created: number;
  model: string;
  choices: Array<{
    index: number;
    message: {
      role: 'assistant';
      content: string;
    };
    finish_reason: 'stop' | 'length' | null;
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Error Handling

typescript

import { 
  NovelAI, 
  AuthenticationError, 
  InvalidRequestError,
  RateLimitError 
} from 'novelai-sdk-unofficial';

try {
  const response = await client.openai.chatCompletion({
    messages: [{ role: 'user', content: 'Hello' }],
  });
} catch (error) {
  if (error instanceof AuthenticationError) {
    console.error('Invalid API Key');
  } else if (error instanceof InvalidRequestError) {
    console.error('Invalid request parameters:', error.message);
  } else if (error instanceof RateLimitError) {
    console.error('Rate limit exceeded');
  }
}

OpenAI Compatible API ​

Available Models ​

Endpoint Overview ​

Basic Usage ​

Text Completion ​

Chat Completion ​

Streaming Responses ​

Text Completion Streaming ​

Chat Completion Streaming ​

Cancelling Streaming Requests ​

List Models ​

Token Count ​

Parameters ​

Completion Parameters ​

Chat Completion Parameters ​

Message Format ​

Unified Sampling Parameters ​

Comparison with Native API ​

When to Use OpenAI Compatible API ​

When to Use Native API ​

Response Format ​

Completion Response ​

Chat Completion Response ​

Error Handling ​

OpenAI Compatible API

Available Models

Endpoint Overview

Basic Usage

Text Completion

Chat Completion

Streaming Responses

Text Completion Streaming

Chat Completion Streaming

Cancelling Streaming Requests

List Models

Token Count

Parameters

Completion Parameters

Chat Completion Parameters

Message Format

Unified Sampling Parameters

Comparison with Native API

When to Use OpenAI Compatible API

When to Use Native API

Response Format

Completion Response

Chat Completion Response

Error Handling