LLM Providers¶

SPADE_LLM supports multiple LLM providers through a unified interface, enabling seamless switching between different AI services.

Provider Architecture¶

graph TD
    A[LLMProvider Interface] --> B[OpenAI Provider]
    A --> C[Ollama Provider]
    A --> D[LM Studio Provider]
    A --> E[vLLM Provider]

    B --> F[GPT-4o]
    B --> G[GPT-4o-mini]
    B --> H[GPT-3.5-turbo]

    C --> I[Llama 3.1:8b]
    C --> J[Mistral:7b]
    C --> K[CodeLlama:7b]

    D --> L[Local Models]
    E --> M[High-Performance Inference]

Supported Providers¶

The unified LLMProvider interface supports:

OpenAI - GPT models via API for production-ready solutions
Ollama - Local open-source models for privacy-focused deployments
LM Studio - Local models with GUI for easy experimentation
vLLM - High-performance inference server for scalable applications

OpenAI Provider¶

Cloud-based LLM service with state-of-the-art models:

from spade_llm.providers import LLMProvider

provider = LLMProvider.create_openai(
    api_key="your-api-key",
    model="gpt-4o-mini",
    temperature=0.7
)

Popular models: gpt-4o, gpt-4o-mini, gpt-3.5-turbo

Key advantages: Excellent tool calling, consistent performance, extensive model options.

Ollama Provider¶

Local deployment for privacy and control:

provider = LLMProvider.create_ollama(
    model="llama3.1:8b",
    base_url="http://localhost:11434/v1"
)

Popular models: llama3.1:8b, mistral:7b, codellama:7b

Tool support: Available with llama3.1:8b, llama3.1:70b, mistral:7b

Key advantages: Complete privacy, no internet required, cost-effective for high usage.

LM Studio Provider¶

Local models with GUI for easy management:

provider = LLMProvider.create_lm_studio(
    model="local-model",
    base_url="http://localhost:1234/v1"
)

The model name should match exactly what's displayed in the LM Studio interface.

Key advantages: User-friendly interface, easy model switching, good for experimentation.

vLLM Provider¶

High-performance inference for production deployments:

provider = LLMProvider.create_vllm(
    model="meta-llama/Llama-2-7b-chat-hf",
    base_url="http://localhost:8000/v1"
)

Start vLLM server:

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-2-7b-chat-hf \
    --port 8000

Key advantages: Optimized performance, batching support, scalable architecture.

Configuration Options¶

Environment Variables¶

Centralized configuration using environment variables:

# .env file
OPENAI_API_KEY=your-key
OLLAMA_BASE_URL=http://localhost:11434/v1
LM_STUDIO_BASE_URL=http://localhost:1234/v1

Dynamic Provider Selection¶

Runtime provider switching based on configuration:

import os

def create_provider():
    provider_type = os.getenv('LLM_PROVIDER', 'openai')

    if provider_type == 'openai':
        return LLMProvider.create_openai(
            api_key=os.getenv('OPENAI_API_KEY'),
            model=os.getenv('OPENAI_MODEL', 'gpt-4o-mini')
        )
    elif provider_type == 'ollama':
        return LLMProvider.create_ollama(
            model=os.getenv('OLLAMA_MODEL', 'llama3.1:8b')
        )

This approach enables easy deployment across different environments without code changes.

Error Handling¶

Robust error handling for production reliability:

try:
    response = await provider.get_llm_response(context)
except Exception as e:
    logger.error(f"Provider error: {e}")
    # Handle fallback or retry logic

Provider Fallback System¶

Automatic failover for high availability:

providers = [
    LLMProvider.create_openai(api_key="key"),
    LLMProvider.create_ollama(model="llama3.1:8b")
]

async def get_response_with_fallback(context):
    for provider in providers:
        try:
            return await provider.get_llm_response(context)
        except Exception:
            continue
    raise Exception("All providers failed")

This pattern ensures service continuity even when individual providers experience issues.

Provider Selection Guide¶

Cloud vs Local¶

Choose OpenAI when: - Need best-in-class performance - Want consistent reliability - Have internet connectivity - Budget allows for API costs

Choose Local Providers when: - Privacy is paramount - Want complete control over infrastructure - Have computational resources - Need to minimize ongoing costs

Performance Considerations¶

OpenAI: Fastest response times, excellent reasoning capabilities Ollama: Good performance with smaller models, privacy benefits LM Studio: Easy setup, good for development and testing
vLLM: Optimized inference, best for high-throughput applications

Tool Calling Support¶

Full tool support: OpenAI (all models) Limited tool support: Ollama (specific models only) Experimental: LM Studio and vLLM (model dependent)

Best Practices¶

Test multiple providers during development to find the best fit
Implement fallback systems for critical applications
Use environment variables for easy configuration management
Monitor provider performance and costs in production
Choose models based on your specific use case requirements

Next Steps¶

Tools System - Add tool capabilities to your providers
Architecture - Understanding the provider layer
Routing - Route responses based on provider capabilities