AI Inference

Deploy OpenAI-compatible AI APIs to AWS Bedrock with Claude models. Early development tool with limited functionality.

The Inference component provides OpenAI-compatible /v1/chat/completions APIs that work as drop-in replacements for OpenAI with significant cost savings.

Inference Features

Key Features

🎯

OpenAI Compatible

Drop-in replacement for OpenAI API with the same endpoints

☁️

AWS Bedrock

Deploy to your AWS account with Bedrock foundation models

🔒

Secure by Default

API key management and HTTPS endpoints configured

💰

Cost Efficient

50-55% savings vs OpenAI with your own AWS account

Quick Start

Deploy Your First AI API

BASH
1# Initialize project with AWS
2onglx-deploy init --host aws
3
4# Add chat API component
5onglx-deploy add inference --component api --type openai
6
7# Deploy to AWS
8onglx-deploy deploy

Testing Your Deployment

Test Your API

BASH
1# Get endpoint and API key
2onglx-deploy status
3
4# Test the API
5curl -X POST https://your-endpoint/v1/chat/completions \
6  -H 'Authorization: Bearer sk-onglx-your-api-key' \
7  -H 'Content-Type: application/json' \
8  -d '{
9    "model": "claude-3.5-sonnet",
10    "messages": [{"role": "user", "content": "Hello from OnglX!"}],
11    "max_tokens": 100
12  }'

Prerequisites

  • AWS account with programmatic access
  • Configure AWS credentials (aws configure)
  • Install OnglX Deploy CLI tool
  • AWS Bedrock model access (see troubleshooting below)

Supported Models

AWS Bedrock Models (Available)

  • Claude 3.5 Sonnet - Primary model, most capable
  • Claude 3 Haiku - Fast and cost-effective option
  • Amazon Titan Text - AWS native model, usually pre-enabled

🚧 Note: GCP support and additional models coming soon.

Component Types

OnglX Deploy supports two inference deployment types:

🔌

API Component

OpenAI-compatible REST API for programmatic access

Type:openai
Endpoint:/v1/chat/completions
Use Case:Applications & SDKs
🌐

Web UI Component

OpenWebUI interface for interactive chat sessions

Type:openwebui
Interface:Web Browser
Use Case:Human Interaction

SDK Integration

Python

PYTHON
1import openai
2
3# Works with your deployed AWS endpoint
4client = openai.OpenAI(
5    api_key="sk-onglx-your-key",
6    base_url="https://your-endpoint"
7)
8
9# Use AWS Bedrock models
10response = client.chat.completions.create(
11    model="claude-3.5-sonnet",
12    messages=[{"role": "user", "content": "Hello world"}]
13)

JavaScript

JAVASCRIPT
1import OpenAI from 'openai';
2
3// Basic AWS Bedrock usage
4const openai = new OpenAI({
5  apiKey: 'sk-onglx-your-key',
6  baseURL: 'https://your-endpoint'
7});
8
9const completion = await openai.chat.completions.create({
10  model: 'claude-3.5-sonnet',
11  messages: [{ role: 'user', content: 'Hello world' }]
12});

Migration from OpenAI

DIFF
1import OpenAI from 'openai';
2
3const openai = new OpenAI({
4-  apiKey: process.env.OPENAI_API_KEY,
5+  apiKey: process.env.ONGLX_API_KEY,
6+  baseURL: process.env.ONGLX_API_BASE_URL,
7});
8
9const completion = await openai.chat.completions.create({
10-  model: 'gpt-4',
11+  model: 'claude-3.5-sonnet',  // AWS Bedrock
12  messages: messages,
13});

Troubleshooting

Model Access Issues

  1. Go to AWS Console → Amazon Bedrock → Model access
  2. Click "Request model access"
  3. Enable desired models (approval usually instant)
  4. Start with models that have broader access:
    • amazon.titan-text-express-v1 - Usually pre-enabled
    • amazon.titan-text-lite-v1 - Cost-effective option

Authentication Issues

  • Use the same API key for both required headers:
BASH
curl -X POST https://your-endpoint/v1/chat/completions \
  -H 'Authorization: Bearer sk-onglx-your-api-key' \
  -H 'X-API-Key: sk-onglx-your-api-key' \
  -H 'Content-Type: application/json'

OpenWebUI Configuration

Instance Sizes

The Web UI component supports different instance sizes for resource allocation:

Small

CPU: 512 units (0.5 vCPU)
Memory: 1024 MB
Best for: Personal use

Medium

CPU: 1024 units (1 vCPU)
Memory: 2048 MB
Best for: Team use

Large

CPU: 2048 units (2 vCPU)
Memory: 4096 MB
Best for: Heavy workloads

Deployment Details

When you deploy the OpenWebUI component, OnglX Deploy creates:

  • AWS ECS Fargate service running Open WebUI container
  • Application Load Balancer for HTTP/HTTPS access
  • AWS EFS filesystem for persistent conversation storage
  • VPC with security groups for network isolation
BASH
# Deploy with specific size
onglx-deploy add inference --component ui --type openwebui --size medium

# Check deployment status and get endpoint
onglx-deploy status

Next Steps