API Reference

Complete reference documentation for all BrainTwin.ai API endpoints.

Base URL

https://inference.braintwin.ai/v1

All API endpoints are relative to this base URL.

All requests require authentication using your API key in the Authorization: Bearer YOUR_API_KEY header.
POST/v1/chat/completions

Create Chat Completion

Creates a model response for the given chat conversation. This is the main endpoint for generating AI responses.

Parameters

modelstringrequired

ID of the model to use. Available models: gpt-3.5-turbo, gpt-4

messagesarrayrequired

A list of messages comprising the conversation so far.

temperaturenumber

Sampling temperature between 0 and 2. Higher values make output more random. Default: 1

max_tokensinteger

Maximum number of tokens to generate. Default: 16

streamboolean

Whether to stream back partial progress. Default: false

presence_penaltynumber

Penalty for new tokens based on presence in text. Range: -2.0 to 2.0

frequency_penaltynumber

Penalty for new tokens based on frequency in text. Range: -2.0 to 2.0

Examples

cURL Example

curl
curl https://inference.braintwin.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 150,
    "stream": false
  }'

Available Models

Choose the right model for your use case:

gpt-3.5-turbo

Most Popular

Fast and efficient model optimized for chat and general tasks. Best balance of speed and capability.

Context Window:
4,096 tokens
Training Data:
Up to Sep 2021
Best For:
Chat, Q&A, Simple tasks
Speed:
Very Fast

gpt-4

Most Capable

More capable model with superior reasoning, analysis, and complex task handling.

Context Window:
8,192 tokens
Training Data:
Up to Sep 2021
Best For:
Complex reasoning, Code, Analysis
Speed:
Moderate

Streaming Responses

Enable streaming to receive partial responses as they're generated:

Streaming Example

curl https://inference.braintwin.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'
When streaming is enabled, you'll receive Server-Sent Events (SSE) with partial responses. Each chunk contains a data: field with partial completion data.

Error Codes

The API uses standard HTTP status codes to indicate success or failure:

200OK

Request successful

400Bad Request

Invalid request parameters or malformed JSON

401Unauthorized

Invalid or missing API key

429Too Many Requests

Rate limit exceeded

500Internal Server Error

Server error - please try again later

Rate Limiting

API requests are rate limited based on your subscription plan. Rate limit information is included in response headers:

X-RateLimit-Limit: Total requests allowed per time window
X-RateLimit-Remaining: Remaining requests in current window
X-RateLimit-Reset: Unix timestamp when the rate limit resets
If you exceed your rate limit, you'll receive a 429 Too Many Requests error. Consider implementing exponential backoff in your applications.