Ollama
Local LLM inference server
Ollama provides local AI model inference, reducing reliance on external APIs and improving privacy.
Overview
- Port: 11434
- Image:
ollama/ollama:latest - Purpose: Local AI model hosting
- Status: Disabled by default (heavy resource usage)
Configuration
Enable Ollama
ENABLE_OLLAMA=yes
Requirements
- RAM: 8GB+ minimum (16GB+ recommended)
- Storage: 20GB+ for models
- CPU: Modern processor recommended
Features
Supported Models
- Llama 2: Meta's open-source model
- Code Llama: Programming assistance
- Mistral: Fast and efficient
- Phi: Microsoft's compact models
- Custom Models: Import your own
Model Management
# List available models
curl http://localhost:11434/api/tags
# Pull a model
curl -X POST http://localhost:11434/api/pull \
-d '{"name": "llama2"}'
# Generate text
curl -X POST http://localhost:11434/api/generate \
-d '{"model": "llama2", "prompt": "Hello, world!"}'
Usage
Via OpenWebUI
- Configure Model: Add Ollama endpoint in OpenWebUI
- Select Model: Choose from available models
- Chat: Use locally hosted models
Via API
// Direct API call
const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'llama2',
prompt: 'Explain blockchain technology',
stream: false
})
});
Integration with Services
- OpenWebUI: Local model selection
- Flowise: Custom LLM nodes
- ElizaOS: Local inference
- n8n: API calls for text generation
Model Management
Popular Models
# Small models (4GB RAM)
llama2:7b
mistral:7b
phi:latest
# Medium models (8GB RAM)
llama2:13b
codellama:13b
# Large models (16GB+ RAM)
llama2:70b
Performance Tuning
# Check GPU support
nvidia-smi # For NVIDIA GPUs
# Monitor resource usage
docker stats seiling-ollama
Troubleshooting
High Resource Usage
# Check memory usage
docker stats seiling-ollama
# Monitor system resources
free -h
df -h
Model Loading Issues
# Check Ollama logs
docker logs seiling-ollama --tail=50
# Verify model download
curl http://localhost:11434/api/tags
Performance Issues
# Restart Ollama
docker restart seiling-ollama
# Clear downloaded models
docker exec seiling-ollama rm -rf /root/.ollama/models/*
Disable if Needed
# Disable Ollama to save resources
ENABLE_OLLAMA=no
# Restart services
docker compose restart
Ollama enables local AI inference. Only use if you have sufficient system resources and want privacy/offline capabilities.