AI Inference
Deploy OpenAI-compatible AI APIs to AWS Bedrock with Claude models. Early development tool with limited functionality.
The Inference component provides OpenAI-compatible /v1/chat/completions
APIs that work as drop-in replacements for OpenAI with significant cost savings.
Inference Features
Key Features
OpenAI Compatible
Drop-in replacement for OpenAI API with the same endpoints
AWS Bedrock
Deploy to your AWS account with Bedrock foundation models
Secure by Default
API key management and HTTPS endpoints configured
Cost Efficient
50-55% savings vs OpenAI with your own AWS account
Quick Start
Deploy Your First AI API
1# Initialize project with AWS
2onglx-deploy init --host aws
3
4# Add chat API component
5onglx-deploy add inference --component api --type openai
6
7# Deploy to AWS
8onglx-deploy deploy
Testing Your Deployment
Test Your API
1# Get endpoint and API key
2onglx-deploy status
3
4# Test the API
5curl -X POST https://your-endpoint/v1/chat/completions \
6 -H 'Authorization: Bearer sk-onglx-your-api-key' \
7 -H 'Content-Type: application/json' \
8 -d '{
9 "model": "claude-3.5-sonnet",
10 "messages": [{"role": "user", "content": "Hello from OnglX!"}],
11 "max_tokens": 100
12 }'
Prerequisites
- AWS account with programmatic access
- Configure AWS credentials (
aws configure
) - Install OnglX Deploy CLI tool
- AWS Bedrock model access (see troubleshooting below)
Supported Models
AWS Bedrock Models (Available)
- Claude 3.5 Sonnet - Primary model, most capable
- Claude 3 Haiku - Fast and cost-effective option
- Amazon Titan Text - AWS native model, usually pre-enabled
🚧 Note: GCP support and additional models coming soon.
Component Types
OnglX Deploy supports two inference deployment types:
API Component
OpenAI-compatible REST API for programmatic access
Web UI Component
OpenWebUI interface for interactive chat sessions
SDK Integration
Python
1import openai
2
3# Works with your deployed AWS endpoint
4client = openai.OpenAI(
5 api_key="sk-onglx-your-key",
6 base_url="https://your-endpoint"
7)
8
9# Use AWS Bedrock models
10response = client.chat.completions.create(
11 model="claude-3.5-sonnet",
12 messages=[{"role": "user", "content": "Hello world"}]
13)
JavaScript
1import OpenAI from 'openai';
2
3// Basic AWS Bedrock usage
4const openai = new OpenAI({
5 apiKey: 'sk-onglx-your-key',
6 baseURL: 'https://your-endpoint'
7});
8
9const completion = await openai.chat.completions.create({
10 model: 'claude-3.5-sonnet',
11 messages: [{ role: 'user', content: 'Hello world' }]
12});
Migration from OpenAI
1import OpenAI from 'openai';
2
3const openai = new OpenAI({
4- apiKey: process.env.OPENAI_API_KEY,
5+ apiKey: process.env.ONGLX_API_KEY,
6+ baseURL: process.env.ONGLX_API_BASE_URL,
7});
8
9const completion = await openai.chat.completions.create({
10- model: 'gpt-4',
11+ model: 'claude-3.5-sonnet', // AWS Bedrock
12 messages: messages,
13});
Troubleshooting
Model Access Issues
- Go to AWS Console → Amazon Bedrock → Model access
- Click "Request model access"
- Enable desired models (approval usually instant)
- Start with models that have broader access:
amazon.titan-text-express-v1
- Usually pre-enabledamazon.titan-text-lite-v1
- Cost-effective option
Authentication Issues
- Use the same API key for both required headers:
curl -X POST https://your-endpoint/v1/chat/completions \
-H 'Authorization: Bearer sk-onglx-your-api-key' \
-H 'X-API-Key: sk-onglx-your-api-key' \
-H 'Content-Type: application/json'
OpenWebUI Configuration
Instance Sizes
The Web UI component supports different instance sizes for resource allocation:
Small
Medium
Large
Deployment Details
When you deploy the OpenWebUI component, OnglX Deploy creates:
- AWS ECS Fargate service running Open WebUI container
- Application Load Balancer for HTTP/HTTPS access
- AWS EFS filesystem for persistent conversation storage
- VPC with security groups for network isolation
# Deploy with specific size
onglx-deploy add inference --component ui --type openwebui --size medium
# Check deployment status and get endpoint
onglx-deploy status
Next Steps
- Chat Interface Guide - Detailed setup for both API and Web UI
- AWS Setup Guide - Complete AWS setup walkthrough
- CLI Reference - Complete command reference
- Configuration - Configure your deployment settings
- Security - Best practices for production