AI Inference

Deploy OpenAI-compatible AI APIs to AWS Bedrock with Claude models. Early development tool with limited functionality.

The Inference component provides OpenAI-compatible /v1/chat/completions APIs that work as drop-in replacements for OpenAI with significant cost savings.

Inference Features

Key Features

🎯

OpenAI Compatible

Drop-in replacement for OpenAI API with the same endpoints

☁️

AWS Bedrock

Deploy to your AWS account with Bedrock foundation models

🔒

Secure by Default

API key management and HTTPS endpoints configured

💰

Cost Efficient

50-55% savings vs OpenAI with your own AWS account

Quick Start

Deploy Your First AI API

BASH

1# Initialize project with AWS
2onglx-deploy init --host aws
3
4# Add chat API component
5onglx-deploy add inference --component api --type openai
6
7# Deploy to AWS
8onglx-deploy deploy

Testing Your Deployment

Test Your API

BASH

1# Get endpoint and API key
2onglx-deploy status
3
4# Test the API
5curl -X POST https://your-endpoint/v1/chat/completions \
6  -H 'Authorization: Bearer sk-onglx-your-api-key' \
7  -H 'Content-Type: application/json' \
8  -d '{
9    "model": "claude-3.5-sonnet",
10    "messages": [{"role": "user", "content": "Hello from OnglX!"}],
11    "max_tokens": 100
12  }'

Prerequisites

AWS account with programmatic access
Configure AWS credentials (aws configure)
Install OnglX Deploy CLI tool
AWS Bedrock model access (see troubleshooting below)

Supported Models

AWS Bedrock Models (Available)

Claude 3.5 Sonnet - Primary model, most capable
Claude 3 Haiku - Fast and cost-effective option
Amazon Titan Text - AWS native model, usually pre-enabled

🚧 Note: GCP support and additional models coming soon.

Component Types

OnglX Deploy supports two inference deployment types:

🔌

API Component

OpenAI-compatible REST API for programmatic access

Type:openai

Endpoint:/v1/chat/completions

Use Case:Applications & SDKs

🌐

Web UI Component

OpenWebUI interface for interactive chat sessions

Type:openwebui

Interface:Web Browser

Use Case:Human Interaction

SDK Integration

Python

PYTHON

1import openai
2
3# Works with your deployed AWS endpoint
4client = openai.OpenAI(
5    api_key="sk-onglx-your-key",
6    base_url="https://your-endpoint"
7)
8
9# Use AWS Bedrock models
10response = client.chat.completions.create(
11    model="claude-3.5-sonnet",
12    messages=[{"role": "user", "content": "Hello world"}]
13)

JavaScript

JAVASCRIPT

1import OpenAI from 'openai';
2
3// Basic AWS Bedrock usage
4const openai = new OpenAI({
5  apiKey: 'sk-onglx-your-key',
6  baseURL: 'https://your-endpoint'
7});
8
9const completion = await openai.chat.completions.create({
10  model: 'claude-3.5-sonnet',
11  messages: [{ role: 'user', content: 'Hello world' }]
12});

Migration from OpenAI

DIFF

1import OpenAI from 'openai';
2
3const openai = new OpenAI({
4-  apiKey: process.env.OPENAI_API_KEY,
5+  apiKey: process.env.ONGLX_API_KEY,
6+  baseURL: process.env.ONGLX_API_BASE_URL,
7});
8
9const completion = await openai.chat.completions.create({
10-  model: 'gpt-4',
11+  model: 'claude-3.5-sonnet',  // AWS Bedrock
12  messages: messages,
13});

Troubleshooting

Model Access Issues

Go to AWS Console → Amazon Bedrock → Model access
Click "Request model access"
Enable desired models (approval usually instant)
Start with models that have broader access:
- amazon.titan-text-express-v1 - Usually pre-enabled
- amazon.titan-text-lite-v1 - Cost-effective option

Authentication Issues

Use the same API key for both required headers:

BASH

curl -X POST https://your-endpoint/v1/chat/completions \
  -H 'Authorization: Bearer sk-onglx-your-api-key' \
  -H 'X-API-Key: sk-onglx-your-api-key' \
  -H 'Content-Type: application/json'

OpenWebUI Configuration

Instance Sizes

The Web UI component supports different instance sizes for resource allocation:

Small

CPU: 512 units (0.5 vCPU)

Memory: 1024 MB

Best for: Personal use

Medium

CPU: 1024 units (1 vCPU)

Memory: 2048 MB

Best for: Team use

Large

CPU: 2048 units (2 vCPU)

Memory: 4096 MB

Best for: Heavy workloads

Deployment Details

When you deploy the OpenWebUI component, OnglX Deploy creates:

AWS ECS Fargate service running Open WebUI container
Application Load Balancer for HTTP/HTTPS access
AWS EFS filesystem for persistent conversation storage
VPC with security groups for network isolation

BASH

# Deploy with specific size
onglx-deploy add inference --component ui --type openwebui --size medium

# Check deployment status and get endpoint
onglx-deploy status

Next Steps

Chat Interface Guide - Detailed setup for both API and Web UI
AWS Setup Guide - Complete AWS setup walkthrough
CLI Reference - Complete command reference
Configuration - Configure your deployment settings
Security - Best practices for production