LLM Providers¶
SPADE_LLM supports multiple LLM providers through a unified interface, enabling seamless switching between different AI services.
Provider Architecture¶
graph TD
A[LLMProvider Interface] --> B[OpenAI Provider]
A --> C[Ollama Provider]
A --> D[LM Studio Provider]
A --> E[vLLM Provider]
B --> F[GPT-4o]
B --> G[GPT-4o-mini]
B --> H[GPT-3.5-turbo]
C --> I[Llama 3.1:8b]
C --> J[Mistral:7b]
C --> K[CodeLlama:7b]
D --> L[Local Models]
E --> M[High-Performance Inference]
Supported Providers¶
The unified LLMProvider interface supports:
- OpenAI - GPT models via API for production-ready solutions
- Ollama - Local open-source models for privacy-focused deployments
- LM Studio - Local models with GUI for easy experimentation
- vLLM - High-performance inference server for scalable applications
OpenAI Provider¶
Cloud-based LLM service with state-of-the-art models:
from spade_llm.providers import LLMProvider
provider = LLMProvider.create_openai(
api_key="your-api-key",
model="gpt-4o-mini",
temperature=0.7
)
Popular models: gpt-4o
, gpt-4o-mini
, gpt-3.5-turbo
Key advantages: Excellent tool calling, consistent performance, extensive model options.
Ollama Provider¶
Local deployment for privacy and control:
Popular models: llama3.1:8b
, mistral:7b
, codellama:7b
Tool support: Available with llama3.1:8b
, llama3.1:70b
, mistral:7b
Key advantages: Complete privacy, no internet required, cost-effective for high usage.
LM Studio Provider¶
Local models with GUI for easy management:
The model name should match exactly what's displayed in the LM Studio interface.
Key advantages: User-friendly interface, easy model switching, good for experimentation.
vLLM Provider¶
High-performance inference for production deployments:
provider = LLMProvider.create_vllm(
model="meta-llama/Llama-2-7b-chat-hf",
base_url="http://localhost:8000/v1"
)
Start vLLM server:
Key advantages: Optimized performance, batching support, scalable architecture.
Configuration Options¶
Environment Variables¶
Centralized configuration using environment variables:
# .env file
OPENAI_API_KEY=your-key
OLLAMA_BASE_URL=http://localhost:11434/v1
LM_STUDIO_BASE_URL=http://localhost:1234/v1
Dynamic Provider Selection¶
Runtime provider switching based on configuration:
import os
def create_provider():
provider_type = os.getenv('LLM_PROVIDER', 'openai')
if provider_type == 'openai':
return LLMProvider.create_openai(
api_key=os.getenv('OPENAI_API_KEY'),
model=os.getenv('OPENAI_MODEL', 'gpt-4o-mini')
)
elif provider_type == 'ollama':
return LLMProvider.create_ollama(
model=os.getenv('OLLAMA_MODEL', 'llama3.1:8b')
)
This approach enables easy deployment across different environments without code changes.
Error Handling¶
Robust error handling for production reliability:
try:
response = await provider.get_llm_response(context)
except Exception as e:
logger.error(f"Provider error: {e}")
# Handle fallback or retry logic
Provider Fallback System¶
Automatic failover for high availability:
providers = [
LLMProvider.create_openai(api_key="key"),
LLMProvider.create_ollama(model="llama3.1:8b")
]
async def get_response_with_fallback(context):
for provider in providers:
try:
return await provider.get_llm_response(context)
except Exception:
continue
raise Exception("All providers failed")
This pattern ensures service continuity even when individual providers experience issues.
Provider Selection Guide¶
Cloud vs Local¶
Choose OpenAI when: - Need best-in-class performance - Want consistent reliability - Have internet connectivity - Budget allows for API costs
Choose Local Providers when: - Privacy is paramount - Want complete control over infrastructure - Have computational resources - Need to minimize ongoing costs
Performance Considerations¶
OpenAI: Fastest response times, excellent reasoning capabilities
Ollama: Good performance with smaller models, privacy benefits
LM Studio: Easy setup, good for development and testing
vLLM: Optimized inference, best for high-throughput applications
Tool Calling Support¶
Full tool support: OpenAI (all models) Limited tool support: Ollama (specific models only) Experimental: LM Studio and vLLM (model dependent)
Best Practices¶
- Test multiple providers during development to find the best fit
- Implement fallback systems for critical applications
- Use environment variables for easy configuration management
- Monitor provider performance and costs in production
- Choose models based on your specific use case requirements
Next Steps¶
- Tools System - Add tool capabilities to your providers
- Architecture - Understanding the provider layer
- Routing - Route responses based on provider capabilities