Ollama

Local LLM inference server

Ollama provides local AI model inference, reducing reliance on external APIs and improving privacy.

Overview

Port: 11434
Image: ollama/ollama:latest
Purpose: Local AI model hosting
Status: Disabled by default (heavy resource usage)

Configuration

Enable Ollama

ENABLE_OLLAMA=yes

Requirements

RAM: 8GB+ minimum (16GB+ recommended)
Storage: 20GB+ for models
CPU: Modern processor recommended

Features

Supported Models

Llama 2: Meta's open-source model
Code Llama: Programming assistance
Mistral: Fast and efficient
Phi: Microsoft's compact models
Custom Models: Import your own

Model Management

# List available models
curl http://localhost:11434/api/tags

# Pull a model
curl -X POST http://localhost:11434/api/pull \
  -d '{"name": "llama2"}'

# Generate text
curl -X POST http://localhost:11434/api/generate \
  -d '{"model": "llama2", "prompt": "Hello, world!"}'

Usage

Via OpenWebUI

Configure Model: Add Ollama endpoint in OpenWebUI
Select Model: Choose from available models
Chat: Use locally hosted models

Via API

// Direct API call
const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama2',
    prompt: 'Explain blockchain technology',
    stream: false
  })
});

Integration with Services

OpenWebUI: Local model selection
Flowise: Custom LLM nodes
ElizaOS: Local inference
n8n: API calls for text generation

Model Management

Popular Models

# Small models (4GB RAM)
llama2:7b
mistral:7b
phi:latest

# Medium models (8GB RAM)
llama2:13b
codellama:13b

# Large models (16GB+ RAM)
llama2:70b

Performance Tuning

# Check GPU support
nvidia-smi  # For NVIDIA GPUs

# Monitor resource usage
docker stats seiling-ollama

Troubleshooting

High Resource Usage

# Check memory usage
docker stats seiling-ollama

# Monitor system resources
free -h
df -h

Model Loading Issues

# Check Ollama logs
docker logs seiling-ollama --tail=50

# Verify model download
curl http://localhost:11434/api/tags

Performance Issues

# Restart Ollama
docker restart seiling-ollama

# Clear downloaded models
docker exec seiling-ollama rm -rf /root/.ollama/models/*

Disable if Needed

# Disable Ollama to save resources
ENABLE_OLLAMA=no

# Restart services
docker compose restart

Ollama enables local AI inference. Only use if you have sufficient system resources and want privacy/offline capabilities.

Overview​

Configuration​

Enable Ollama​

Requirements​

Features​

Supported Models​

Model Management​

Usage​

Via OpenWebUI​

Via API​

Integration with Services​

Model Management​

Popular Models​

Performance Tuning​

Troubleshooting​

High Resource Usage​

Model Loading Issues​

Performance Issues​

Disable if Needed​

Overview

Configuration

Enable Ollama

Requirements

Features

Supported Models

Model Management

Usage

Via OpenWebUI

Via API

Integration with Services

Model Management

Popular Models

Performance Tuning

Troubleshooting

High Resource Usage

Model Loading Issues

Performance Issues

Disable if Needed