Linchpin / Use cases / Local-LLM agents

Run agents on your local Ollama models.

Linchpin works directly with Ollama. No cloud API keys, no per-token billing, no data leaving your machine. The agent runtime is open source, the models are local, the loop is closed.

Who this is for

§ 01

r/LocalLLaMA readers — you already run Ollama or llama.cpp locally and want to graduate from chat to agents.
Privacy-first solo developers — the prompt, the tool calls, the data, the model weights, the runtime — all on your machine.
Researchers and tinkerers — try different models against the same agent definition without paying API fees per experiment.
Edge / air-gapped deployments — when the machine has no network egress, local models plus Linchpin gives you a complete agent stack with zero external calls.

How it fits together

§ 02

Ollama runs as a local HTTP server (default http://localhost:11434). Linchpin's ollama provider points at it. The agent's session state, event log, and sandbox containers all live in Linchpin; the model weights and inference live in Ollama. Both run on the same machine.

Fig. 01 · Local agent stack with Linchpin + Ollama

flowchart LR
    you[You] -->|chat / HTTP| linchpin[Linchpin
runtime]
    linchpin -->|/api/chat| ollama[Ollama
http://localhost:11434]
    ollama --> weights[(local model
weights)]
    linchpin --> pg[(Postgres
event log)]
    linchpin --> sbx[per-session
sandbox]
    style linchpin fill:#B83A1A,color:#F4F2EC
    style ollama fill:#1A1A1A,color:#F4F2EC

Configuration

§ 03

Point Linchpin at your local Ollama in the .env:

# .env
LINCHPIN_API_KEY=dev-key
VAULT_ENCRYPTION_KEY=$(openssl rand -base64 32)

# model provider
MODEL_PROVIDER=ollama
OLLAMA_HOST=http://host.docker.internal:11434

If Linchpin is on the same host as Ollama (typical), host.docker.internal resolves from inside the Linchpin container. If Ollama is on a different machine on the LAN, point at that host's IP. From there, define an agent whose model is whatever you have pulled — llama3.2, qwen2.5, mistral, deepseek-coder, anything Ollama can run.

Which models work as agents

§ 04

Not every chat model holds together inside an agent loop. The honest signal is "does the model reliably emit tool calls when asked, and follow multi-step instructions across many turns." A working set today:

Llama 3.1 / 3.2 (8B+) — solid tool calling, reasonable instruction following at 8B; better at 70B if you have the GPU.
Qwen 2.5 (7B+) — strong tool calling, good multi-turn coherence. Punches above its weight at 7B.
DeepSeek-Coder / DeepSeek-V2.5 — good for code-shaped agent tasks.
Mistral / Mixtral — works for chat-shaped agent tasks; tool calling is weaker than Llama 3.x or Qwen 2.5.

Smaller models (3B and below) will agent — but unreliably. Expect more "model forgot the tool format" and "model went in circles" failures. For a real workload, plan on 7B or larger.

Hardware notes

§ 05

Ollama: a Mac with 16 GB unified memory will run 7-8B models comfortably; 32 GB will fit 13-14B. For 70B models you want a 64 GB+ Mac or a real GPU (24 GB VRAM minimum).
Linchpin: the runtime itself is light — 1-2 GB RAM, negligible CPU. The Postgres event log and the per-session sandbox containers are the only consumers.
Latency: first token is dominated by model loading on first use; Ollama keeps the model warm. Steady-state token rate is whatever your hardware delivers.

Quickstart

§ 06

# 1. pull a model
ollama pull llama3.2:8b

# 2. clone linchpin
git clone https://github.com/linchpinhq/linchpin
cd linchpin
cp .env.example .env
# set MODEL_PROVIDER=ollama and OLLAMA_HOST

# 3. up
docker compose up --build

Define an agent with model: "llama3.2:8b", open a session, send an event. Streaming output comes back over SSE. See the docs for the full curl flow.

§ 07