Skip to content

LLM Providers

SPADE_LLM supports multiple LLM providers through a unified interface, enabling seamless switching between different AI services.

Provider Architecture

graph TD
    A[LLMProvider Interface] --> B[OpenAI Provider]
    A --> C[Ollama Provider]
    A --> D[LM Studio Provider]
    A --> E[vLLM Provider]

    B --> F[GPT-4o]
    B --> G[GPT-4o-mini]
    B --> H[GPT-3.5-turbo]

    C --> I[Llama 3.1:8b]
    C --> J[Mistral:7b]
    C --> K[CodeLlama:7b]

    D --> L[Local Models]
    E --> M[High-Performance Inference]

Supported Providers

The unified LLMProvider interface supports:

  • OpenAI - GPT models via API for production-ready solutions
  • Ollama - Local open-source models for privacy-focused deployments
  • LM Studio - Local models with GUI for easy experimentation
  • vLLM - High-performance inference server for scalable applications

OpenAI Provider

Cloud-based LLM service with state-of-the-art models:

from spade_llm.providers import LLMProvider

provider = LLMProvider.create_openai(
    api_key="your-api-key",
    model="gpt-4o-mini",
    temperature=0.7
)

Popular models: gpt-4o, gpt-4o-mini, gpt-3.5-turbo

Key advantages: Excellent tool calling, consistent performance, extensive model options.

Ollama Provider

Local deployment for privacy and control:

provider = LLMProvider.create_ollama(
    model="llama3.1:8b",
    base_url="http://localhost:11434/v1"
)

Popular models: llama3.1:8b, mistral:7b, codellama:7b

Tool support: Available with llama3.1:8b, llama3.1:70b, mistral:7b

Key advantages: Complete privacy, no internet required, cost-effective for high usage.

LM Studio Provider

Local models with GUI for easy management:

provider = LLMProvider.create_lm_studio(
    model="local-model",
    base_url="http://localhost:1234/v1"
)

The model name should match exactly what's displayed in the LM Studio interface.

Key advantages: User-friendly interface, easy model switching, good for experimentation.

vLLM Provider

High-performance inference for production deployments:

provider = LLMProvider.create_vllm(
    model="meta-llama/Llama-2-7b-chat-hf",
    base_url="http://localhost:8000/v1"
)

Start vLLM server:

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-2-7b-chat-hf \
    --port 8000

Key advantages: Optimized performance, batching support, scalable architecture.

Configuration Options

Environment Variables

Centralized configuration using environment variables:

# .env file
OPENAI_API_KEY=your-key
OLLAMA_BASE_URL=http://localhost:11434/v1
LM_STUDIO_BASE_URL=http://localhost:1234/v1

Dynamic Provider Selection

Runtime provider switching based on configuration:

import os

def create_provider():
    provider_type = os.getenv('LLM_PROVIDER', 'openai')

    if provider_type == 'openai':
        return LLMProvider.create_openai(
            api_key=os.getenv('OPENAI_API_KEY'),
            model=os.getenv('OPENAI_MODEL', 'gpt-4o-mini')
        )
    elif provider_type == 'ollama':
        return LLMProvider.create_ollama(
            model=os.getenv('OLLAMA_MODEL', 'llama3.1:8b')
        )

This approach enables easy deployment across different environments without code changes.

Error Handling

Robust error handling for production reliability:

try:
    response = await provider.get_llm_response(context)
except Exception as e:
    logger.error(f"Provider error: {e}")
    # Handle fallback or retry logic

Provider Fallback System

Automatic failover for high availability:

providers = [
    LLMProvider.create_openai(api_key="key"),
    LLMProvider.create_ollama(model="llama3.1:8b")
]

async def get_response_with_fallback(context):
    for provider in providers:
        try:
            return await provider.get_llm_response(context)
        except Exception:
            continue
    raise Exception("All providers failed")

This pattern ensures service continuity even when individual providers experience issues.

Provider Selection Guide

Cloud vs Local

Choose OpenAI when: - Need best-in-class performance - Want consistent reliability - Have internet connectivity - Budget allows for API costs

Choose Local Providers when: - Privacy is paramount - Want complete control over infrastructure - Have computational resources - Need to minimize ongoing costs

Performance Considerations

OpenAI: Fastest response times, excellent reasoning capabilities Ollama: Good performance with smaller models, privacy benefits LM Studio: Easy setup, good for development and testing
vLLM: Optimized inference, best for high-throughput applications

Tool Calling Support

Full tool support: OpenAI (all models) Limited tool support: Ollama (specific models only) Experimental: LM Studio and vLLM (model dependent)

Best Practices

  • Test multiple providers during development to find the best fit
  • Implement fallback systems for critical applications
  • Use environment variables for easy configuration management
  • Monitor provider performance and costs in production
  • Choose models based on your specific use case requirements

Next Steps

  • Tools System - Add tool capabilities to your providers
  • Architecture - Understanding the provider layer
  • Routing - Route responses based on provider capabilities