Important
Vibe code experiment.
A lightweight proxy for LLM interactions with basic guardrails, logging, and metrics.
- Single LLM provider support (OpenAI)
- Basic guardrail for banned words
- Logging of requests and responses
- Prometheus metrics
- Config-driven setup
- Docker deployment
- Monitoring with Prometheus and Grafana
The proxy can be configured using a YAML file or environment variables:
server:
port: 8080
llm:
url: "https://api.openai.com/v1/chat/completions"
api_key: "YOUR_OPENAI_API_KEY"
guardrails:
banned_words:
- "bomb"
- "attack"
SERVER_PORT
: Server port (default: 8080)LLM_URL
: LLM API endpoint URLLLM_API_KEY
: LLM API keyBANNED_WORDS
: Comma-separated list of banned words
Request:
{
"prompt": "Your prompt to the LLM",
"model_params": {
"model": "gpt-3.5-turbo",
"temperature": 0.7,
"max_tokens": 256
}
}
Response:
{
"completion": "LLM response text..."
}
GET /metrics
Returns Prometheus-formatted metrics including:
llm_requests_total
: Total number of LLM requests processedllm_errors_total
: Total number of errors from LLM callsllm_tokens_total
: Total number of tokens used in LLM callsguardrail_blocks_total
: Total number of requests blocked by guardrails
GET /health
# Build and run
go build -o ai-proxy ./cmd/server
./ai-proxy --config config/config.yaml
# Build Docker image
docker build -t ai-proxy:0.1 .
# Run with configuration in environment variables
docker run -p 8080:8080 \
-e LLM_API_KEY=your_openai_api_key \
ai-proxy:0.1
# Or mount a custom config file
docker run -p 8080:8080 \
-v $(pwd)/config/config.yaml:/app/config/config.yaml \
ai-proxy:0.1
The project includes a complete monitoring stack with Prometheus and Grafana.
# Start the entire stack (AI Proxy, Prometheus, and Grafana)
docker-compose up -d
# Access the services:
# - AI Proxy: http://localhost:8080
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3000 (login with admin/admin)
# Stop the services
docker-compose down
# Send a query
curl -X POST http://localhost:8080/v1/query \
-H "Content-Type: application/json" \
-d '{"prompt": "Tell me a joke", "model_params": {"temperature": 0.7}}'
# Check metrics
curl http://localhost:8080/metrics
This MVP focuses on core functionality. Some possible future enhancements could include:
-
Enhanced Guardrails
- More sophisticated content filtering options
- Support for custom filtering rules
-
Additional LLM Support
- Integration with other LLM providers
-
Simple Authentication
- Basic rate limiting
-
Performance Improvements
- Optional caching for common queries
- Optimizations for high-traffic scenarios