Choosing a model
Vault Operator works with many providers and models. Not all of them are equally good at being agents.
You will need: an account at the provider of your choice and an API key (or a local model server running). The Tutorial lists the most common providers and where to grab a key.
Use this guide when: you are setting up a new provider, deciding between cloud and local, picking a cheaper helper model for background tasks, or you need to understand the trade-offs.
You will know it works when: you have at least one provider configured with a populated Main tier (and ideally a Frontier slot for the on-demand consult_flagship escalation), and the Test connection button reports success.
Provider-first, not model-first
Vault Operator's setup is provider-centric, not model-centric. You configure a provider once (API key or OAuth). The plugin discovers the available models and sorts them into three tiers automatically via the advisor pattern.
- Budget tier: cheap fast models for routine work. Also used as the fallback helper model.
- Main tier: the default for chat.
- Frontier tier: used on demand by the
consult_flagshiptool when the agent hits a hard synthesis step (max 3 calls per task, capped at 3000 output tokens per call).
If your active provider has no Frontier-tier model, the escalation tool is filtered out of the schema. The agent runs Main-only and never knows the escalation tool existed.
What makes a good model for Vault Operator
Vault Operator is an agent, not a chat assistant. The Main-tier model needs to:
- Support tool use (function calling). The agent combines search, read, write, and plugin calls in one turn.
- Follow instructions precisely. The system prompt is dense with rules, skills, and agent definitions.
- Reason about multi-step tasks. Reading files, searching, editing, and verifying takes planning.
The Frontier tier exists exactly because some steps need the absolute strongest model and the rest do not. Routing Frontier-class work to one tool call instead of the whole loop keeps cost predictable.
Use the latest, most capable models
Vault Operator works best with strong frontier models that are good at tool use and reasoning. Older or smaller models may struggle with bigger tasks, skip approval steps, or call the wrong tools. Most of the testing has been done with Anthropic Claude models.
For background tasks like memory extraction, chat titling, or contextual retrieval, a cheap model is fine. Those tasks are simple and don't need tool use.
Provider categories
Vault Operator supports twelve provider types (anthropic, openai, gemini, ollama, lmstudio, openrouter, azure, custom, github-copilot, kilo-gateway, bedrock, chatgpt-oauth). They fall into three categories, each with different trade-offs.
Cloud providers (API key)
Create an account, get an API key, pay per usage. Best quality and reliability.
| Provider | How to get started | What you get |
|---|---|---|
| Anthropic | Create account at console.anthropic.com, generate API key (starts with sk-ant-...) | Claude model family. Best tool use in testing. |
| OpenAI | Create account at platform.openai.com, generate API key (starts with sk-...) | GPT model family. Fast, good structured output. |
| OpenRouter | Create account at openrouter.ai, generate API key (starts with sk-or-...) | 100+ models from many providers with a single key. Some free tiers. |
| Azure OpenAI | Enterprise deployment through Azure portal | OpenAI models with enterprise compliance and private endpoints. |
Gateway providers (login-based)
No API key needed. You sign in with an existing account.
| Provider | How to get started | What you get |
|---|---|---|
| GitHub Copilot | Click "Sign in with GitHub" in the model config. A device code appears; enter it at github.com/login/device. Requires an active Copilot subscription. | Multiple frontier models through your existing Copilot subscription. No separate API key. Uses an unofficial API (models may change). |
| Kilo Gateway | Click "Sign in" in the model config, or paste an API token directly. | Centralized gateway to multiple frontier models. Organization context, dynamic model listing, managed access. |
Local providers (free, private)
Models run on your machine. No data leaves your device. Free, but needs decent hardware (8GB+ RAM recommended).
| Provider | How to get started | What you get |
|---|---|---|
| Ollama | Install from ollama.ai. Pull a model: ollama pull llama3.2. The server starts automatically at http://localhost:11434. | Many open-source models. Best local experience. Pick a model that supports tool use. |
| LM Studio | Install from lmstudio.ai. Download a model in the app, then start the local server from the Developer tab. | Visual model browser, easy setup. Default URL: http://localhost:1234. |
| Custom | Any server with an OpenAI-compatible API. Enter the base URL (with /v1 suffix) and optional API key. | For self-hosted inference servers, corporate proxies, or any compatible endpoint. |
How to add a model in Vault Operator
- Open Settings > Vault Operator > Providers > Providers
- Click "+ Add provider"
- Select a provider type from the dropdown
- Follow the provider-specific instructions:
- API key providers: Paste your key
- GitHub Copilot: Click "Sign in with GitHub", complete the device flow
- ChatGPT (OAuth): Click "Sign in with ChatGPT", complete the browser PKCE flow
- Kilo Gateway: Click "Sign in" or paste a token
- Local providers (Ollama, LM Studio): the Base URL pre-fills with the default port; adjust if needed
- Click "Refresh" to discover the provider's model list. Vault Operator classifies each model into one of three tiers (Budget / Main / Frontier) automatically; you can override the tier mapping per slot.
- Optionally pick a display name. The active provider radio drives chat by default; the chat-header model picker can override per-task.
- Click Add
Quick pick
For API-key providers, the "Quick pick" dropdown shows popular models with pre-filled IDs. For Ollama and LM Studio, the "Browse installed/available models" button fetches what is running on your local server.
Using different models for different tasks
You don't have to use the same model everywhere. Vault Operator splits model usage across the provider's three tier slots, plus per-mode and per-conversation overrides:
- Budget tier: the cheapest slot. It doubles as the "helper" model used for context condensing, fast-path planning,
plan_presentation, and recipe promotion, and the task router sends simple prompts here. Configure it in the provider's Budget tier slot. Pick the cheapest model that still understands the prompts (Claude Haiku, GPT-4o-mini, Gemini Flash, a local Ollama or MLX model). - Main tier (chat loop): the default for every conversational turn. The active provider's Main slot.
- Frontier tier (
consult_flagship): the on-demand escalation. The active provider's Frontier slot. - Per-agent overrides: there is one built-in agent (Default agent). Custom agents can pin their own model (for example, a read-only agent on a tiny Budget model while the Default agent stays on Main). Settings > Vault Operator > Agents > Agents.
Automatic routing to the Budget tier is controlled by Settings > Vault Operator > Agents > Loop > Task routing. Turn that toggle off if you want every turn to use the Main tier instead. (Earlier docs called the Budget tier a separate "Helper model" setting; it is now the provider's Budget slot, and the Loop section is named "Task routing".)
You can also pin a specific model for a single conversation through the chat-header model picker (shown as mode=override in the cost log). A pinned model always wins over the task router, so it will not be swapped to the Budget tier.
Reasoning effort and thinking
Two controls steer how hard a model "thinks" before answering.
- Reasoning-effort slider (model-native): pick
xhigh,high,medium,low, orminimal. Pin-only, set per model on the provider config. Higher effort means longer responses and higher cost. The slider only renders for models that expose a native reasoning-effort parameter (for example OpenAI o-series, Claude with reasoning, DeepSeek reasoner via OpenAI-compat). - Thinking toggle (binary): on or off, set per conversation from the chat-header picker. For Anthropic this maps to extended thinking with the configured budget tokens. For Bedrock Claude it maps to the same budget via
budget_tokens. For other providers the toggle is hidden if the model has no thinking mode.
The two controls are independent. A model can be pinned to high effort with thinking off, or to minimal with thinking on.
Embedding models
Semantic search needs a separate embedding model. This is a specialized model that converts text into vectors for similarity search.
Configure it in Settings > Vault Operator > Providers > Embeddings, section "Embedding models". Common choices:
- Any OpenAI-compatible embedding endpoint
- Local embedding models via Ollama (e.g.,
nomic-embed-text) - GitHub Copilot and Kilo Gateway also support embedding models
The embedding model only affects search quality. It has no effect on chat responses.
Cost considerations
| Approach | Monthly cost | Notes |
|---|---|---|
| Local only (Ollama/LM Studio) | Free | Requires capable hardware. Quality depends on model size. |
| Free tiers (OpenRouter, Google) | Free | Rate-limited. Good for light usage. |
| GitHub Copilot | Included in subscription | If you already pay for Copilot, no extra cost. |
| Cloud API (light usage) | $5 to $15 | A few conversations per day. |
| Cloud API (heavy usage) | $20 to $50+ | Daily power user with complex tasks. |
Next steps
- Chat interface: How the chat experience works in detail
- Knowledge discovery: Set up semantic search (needs an embedding model)
- Providers reference: Step-by-step setup for each provider
