Providers and models

Vault Operator ships with 12 provider types. This page is the canonical reference for picking, authenticating, and tuning each one. The MCP relay and Connectors tab are covered in Connectors; model selection strategy in Choosing a model.

How to add a provider

Open Settings > Vault Operator > Providers > Providers.
Click "+ Add provider" and pick the provider type.
Authenticate (API key, OAuth, or CLI login, see the matrix below).
Click Refresh to discover the provider's model list.
Map the three tier slots (Budget, Main, Frontier) or accept the auto-classification.
Pick a display name and click Add.

If Refresh returns no models (some OpenAI-compatible endpoints do not implement /v1/models), type the model ID into the Model ID field and save. A provider works fine with a manually entered model ID.

Tier mapping

Vault Operator classifies every discovered model into one of three tiers:

Budget: cheap fast models for routine work
Main: the default tier for chat
Frontier: reserved for the on-demand consult_flagship escalation

You can override the auto-classification per tier slot. If the active provider has no Frontier-tier model, the consult_flagship tool is removed from the agent's schema entirely.

Provider matrix

Provider	Auth	Caching	Notes
Anthropic	API key	explicit (`cache_control` blocks)	Best tool-use reliability. No native embeddings.
OpenAI	API key	openai-implicit (gpt-4o, gpt-4.1, o1, o3, o4)	Native embeddings via `text-embedding-3-small`.
Google Gemini	API key	none in v2.14 (TTL context caching is deferred)	Free tier available. No native embeddings.
OpenRouter	API key	none	Searchable model marketplace. Pricing-based tier classifier.
Azure OpenAI	API key plus endpoint	openai-implicit on OpenAI-family deployments	Enterprise tenant. Native embeddings via deployed model.
Amazon Bedrock	IAM access key (optional session token)	bedrock-cachepoint on Claude family	EU residency via `eu.` cross-region inference profiles. No embeddings in phase 1.
ChatGPT (OAuth)	OAuth (PKCE loopback)	none (Codex backend)	Covered by Plus/Pro subscription. No per-token cost.
GitHub Copilot	OAuth (device flow)	none	Covered by Copilot subscription. Unofficial API, models may change.
Kilo Gateway	Device auth or manual token	none	Organization-scoped gateway. Dynamic model list.
Ollama	none (local)	none	Fully offline. Pick a model that supports tool use.
LM Studio	none (local)	none	Visual model browser. OpenAI-compatible server.
Custom	API key (optional)	depends on server	Any OpenAI-compatible endpoint. vLLM, LocalAI, self-hosted.

Caching values come from src/api/capabilities.ts. "openai-implicit" means the provider applies a prefix cache automatically when the same prefix repeats. "explicit" means Vault Operator inserts cache_control markers into the prompt. "bedrock-cachepoint" is the Bedrock equivalent of explicit caching.

Cloud providers

Anthropic


What you need	API key from console.anthropic.com
Tier mapping (auto)	Frontier: Claude Opus 4.6/4.7. Main: Claude Sonnet 4.5. Budget: Claude Haiku 4.5.
Caching	explicit, via `cache_control` blocks
Embedding	not available natively, use OpenAI for embeddings

Setup:

Create an account at console.anthropic.com.
Go to API Keys and create a new key.
In Vault Operator, open Settings > Vault Operator > Providers > Providers, add Anthropic, paste the key, and pick a model.

Best tool use

Anthropic models are the most reliable at calling Vault Operator's tools correctly. If quality matters most, start here.

OpenAI


What you need	API key from platform.openai.com
Tier mapping (auto)	Frontier: GPT-5, GPT-5-pro. Main: GPT-5.1, GPT-4.1. Budget: GPT-4o-mini, GPT-5-mini.
Caching	openai-implicit on gpt-4o, gpt-4.1, o1, o3, o4 families
Embedding	native support, `text-embedding-3-small` recommended

Setup:

Create an account at platform.openai.com.
Go to API Keys and generate a new key.
In Vault Operator, add an OpenAI provider, paste the key, and pick a model.

Embedding models

An OpenAI key also gives you access to embedding models for semantic search. Configure in Settings > Vault Operator > Providers > Embeddings.

Google Gemini


What you need	API key from Google AI Studio
Tier mapping (auto)	Frontier: Gemini 2.5 Pro. Main: Gemini 2.5 Flash. Budget: Gemini 2.5 Flash-Lite.
Caching	none in v2.14 (TTL context caching is deferred)
Embedding	not available natively

Setup:

Go to Google AI Studio and sign in with your Google account.
Click Create API Key and copy it.
In Vault Operator, add a Google Gemini provider and paste the key.
Browse available models or pick from the pre-configured list.

Free tier

Google Gemini has a free tier with reasonable rate limits. Good starting point if you want to try Vault Operator without paying.

OpenRouter


What you need	API key from openrouter.ai
Tier mapping (auto)	Pricing-based: > $50/M completion = Frontier, $5 to $50 = Main, < $5 = Budget. Family patterns override pricing where possible.
Caching	none
Embedding	not available

Setup:

Create an account at openrouter.ai.
Go to Keys and create a new API key.
In Vault Operator, add an OpenRouter provider and paste the key.
Click Refresh. The model picker is searchable, type "opus", "gpt-5", or any pattern to find a specific model.

Azure OpenAI


What you need	Azure subscription, a deployed model, API key, and endpoint URL
Recommended models	GPT-4o (deployed in your Azure region)
Caching	openai-implicit on OpenAI-family deployments
Embedding	native support via deployed embedding model

Setup:

Deploy a model in your Azure OpenAI resource.
Copy the endpoint URL, API key, and deployment name.
In Vault Operator, add an Azure OpenAI provider and fill in all three fields.

Enterprise use

Azure OpenAI fits organizations with compliance requirements. Data stays inside your Azure tenant.

Amazon Bedrock


What you need	AWS account with Bedrock enabled, IAM user with invoke permissions, access key ID plus secret access key
Tier mapping (auto)	Frontier: Claude Opus 4.x. Main: Claude Sonnet 4.x. Budget: Claude Haiku, Amazon Nova Lite.
Caching	bedrock-cachepoint on Claude family
Embedding	not supported in phase 1, use OpenAI or Ollama for embeddings
Regions	eu-central-1, eu-west-1, eu-west-2, eu-west-3, eu-north-1, us-east-1, us-east-2, us-west-2, plus Asia Pacific

Setup:

In the AWS console, open Bedrock in your preferred region. For the EU, Frankfurt (eu-central-1) is the most common choice.
Go to Model access and request access to the model families you want to use. Approval is usually instant for the major foundation models.

Create an IAM user (or role) with a policy that allows these actions:

json

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": "*"
}

For EU cross-region inference profiles (recommended), the resource ARN pattern covers all EU regions. For a more restricted policy, scope it to the specific inference profile ARNs you use.
Generate an access key ID and secret access key for the user and copy both.
In Vault Operator, add an Amazon Bedrock provider, pick your region, and paste the credentials. Use the quick pick dropdown to select a model.

Cross-region inference profiles

Model IDs prefixed with eu. or us. are cross-region inference profiles. They route requests across the regions in that geography for higher availability. In Europe, eu.anthropic.claude-sonnet-4-5-20250929-v1:0 is the recommended default. It works from any EU region and keeps data inside the EU.

Direct regional model IDs (without a prefix) only work in the specific region that hosts the model. Frankfurt supports a smaller direct model list than the EU inference profiles do.

Temporary credentials

For AWS SSO or STS-issued credentials, fill the session token field as well. Long-lived IAM user credentials don't need it.

Billing

Bedrock bills per-token directly through your AWS account. There is no free tier for most foundation models. Check the AWS Bedrock pricing page before heavy use.

Gateway providers

ChatGPT (OAuth)


What you need	An active ChatGPT Plus or Pro subscription
Available models	gpt-5.5, gpt-5.4, gpt-5.4-mini (Codex-backend lineup as of v2.14, the live `/codex/models` fetch supersedes this fallback when reachable)
Caching	none (Codex backend does not expose caching)
Embedding	not available

Setup (OAuth PKCE loopback flow, desktop only):

In Vault Operator, add a ChatGPT (OAuth) provider.
Click "Sign in with ChatGPT". The browser opens with auth.openai.com.
Sign in with the same account that holds your ChatGPT Plus or Pro subscription.
After approval the browser redirects to a localhost callback the plugin opened for the duration of the flow. The tab closes itself.
Click "Refresh" to load the Codex model lineup, then map the tiers (Budget, Main, Frontier).

Behind the scenes the plugin routes requests through chatgpt.com/backend-api/codex/responses, the same endpoint that the Codex CLI uses. Tokens are stored encrypted via your OS keychain (safeStorage). Refresh tokens auto-renew before expiry.

Covered by your subscription

ChatGPT-OAuth bills against your existing Plus or Pro plan, not against an OpenAI API key. There is no per-token cost, rate limits follow the subscription tier. The plugin still tracks the equivalent API cost in the sidebar footer for transparency.

Reasoning effort

GPT-5 family models require a reasoning block on every request. The default effort is low (the narrowest value accepted across the family) to minimise latency and cost. Since v2.14 you can override it per pinned model via the Reasoning effort slider (minimal, low, medium, high) in the model config modal.

GitHub Copilot


What you need	An active GitHub Copilot subscription (Individual, Business, or Enterprise)
Tier mapping (auto)	Frontier: Claude Opus (when entitled), GPT-5. Main: Claude Sonnet, GPT-4.1. Budget: GPT-4o-mini.
Caching	none
Embedding	not available

Setup (OAuth device flow):

In Vault Operator, add a GitHub Copilot provider.
Click "Sign in with GitHub". A device code appears.
Open github.com/login/device in your browser.
Enter the code and authorize the app.
Vault Operator automatically detects your available models.

No extra cost

If you already pay for GitHub Copilot, this costs nothing extra. The models come with your subscription. The API is unofficial, so models may change without notice.

Kilo Gateway


What you need	A Kilo Code account with gateway access
Recommended models	Centralized gateway to multiple frontier models, organization-scoped
Caching	none
Embedding	not available

Setup (device auth, recommended):

In Vault Operator, add a Kilo Gateway provider.
Click "Sign in". A device code and URL appear.
Open the URL in your browser, enter the code, and authorize.
Models are loaded dynamically from your organization.

Setup (manual token):

Obtain a gateway token from your Kilo Code admin.
In Vault Operator, add a Kilo Gateway provider and choose "Manual Token".
Paste the token. Models load automatically.

Local providers

Ollama


What you need	Ollama installed on your machine
Tier mapping (auto)	All models classified as Budget unless you override per slot. Pick the largest model you can run locally as your Main override.
Caching	none
Embedding	supported via `nomic-embed-text` or similar

Setup:

Install Ollama from ollama.ai.
Pull a model: ollama pull qwen2.5:7b.
In Vault Operator, add an Ollama provider. No API key needed.
The Base URL field pre-fills with http://localhost:11434, adjust only if you run Ollama on a non-default port.
Click "Refresh" to populate the model list from Ollama's native /api/tags endpoint.

Privacy

With Ollama, no data leaves your machine. Good for sensitive vaults.

LM Studio


What you need	LM Studio installed with a model loaded
Recommended models	Any GGUF model from the built-in catalog
Caching	none
Embedding	supported for compatible models

Setup:

Install LM Studio from lmstudio.ai.
Download a model from the catalog and load it.
Start the local server (LM Studio > Developer tab).
In Vault Operator, add an LM Studio provider. No API key needed.
The Base URL field pre-fills with http://localhost:1234, adjust only if you changed the server port.

Custom endpoint


What you need	Any OpenAI-compatible API endpoint
Recommended models	depends on the server
Caching	depends on the server
Embedding	depends on the server

Setup:

In Vault Operator, add a Custom provider.
Enter the base URL (for example, http://localhost:8080/v1).
Enter an API key if your server requires one.
Type the model name exactly as the server expects.

This works with any server that implements the OpenAI chat completions API: vLLM, text-generation-inference, LocalAI, and self-hosted endpoints.

Migrating from the old "Models" tab

Before v2.11 the plugin tracked one row per model in a flat activeModels[] list. v2.11 replaces that with providerConfigs[], one row per provider with the discovered model list and tier mapping attached. A one-shot migration on first load groups your existing models by provider type, picks the first enabled model's credentials as the provider's auth, and classifies each enabled model into a tier.

A one-shot notification modal summarises the result and flags anomalies (multi-auth setups, missing Frontier slot, custom endpoints that need manual tier assignment). The original list lives at legacy_active_models_backup for 30 days in case you want to roll back.

The Models tab is hidden from the navigation in v2.11. It re-appears for users who configure new OAuth providers until the inline OAuth flow lands in a later release.

Provider comparison

Provider	Auth	Caching	Cost	Privacy	Embedding	Best for
Anthropic	API key	explicit	pay-per-use	cloud	no	best quality
OpenAI	API key	openai-implicit	pay-per-use	cloud	yes	structured output, embeddings
Google Gemini	API key	none	free tier plus pay	cloud	no	free starting point
OpenRouter	API key	none	pay-per-use	cloud	no	model variety
Azure OpenAI	API key plus endpoint	openai-implicit	enterprise	enterprise tenant	yes	compliance
Amazon Bedrock	IAM access key	bedrock-cachepoint	pay-per-use via AWS	cloud (your AWS account)	no	EU data residency via `eu-central-1`
ChatGPT (OAuth)	OAuth (PKCE)	none	Plus/Pro subscription	cloud	no	existing ChatGPT subscribers, Codex-line models
GitHub Copilot	OAuth (device)	none	subscription	cloud	no	existing Copilot subscribers
Kilo Gateway	device auth or token	none	organization	cloud	no	team deployments
Ollama	none	none	free	fully local	yes	privacy, offline
LM Studio	none	none	free	fully local	yes	visual model browser
Custom	varies	depends on server	varies	varies	varies	self-hosted setups

Providers and models ​

How to add a provider ​

Tier mapping ​

Provider matrix ​

Cloud providers ​

Anthropic ​

OpenAI ​

Google Gemini ​

OpenRouter ​

Azure OpenAI ​

Amazon Bedrock ​

Gateway providers ​

ChatGPT (OAuth) ​

GitHub Copilot ​

Kilo Gateway ​

Local providers ​

Ollama ​

LM Studio ​

Custom endpoint ​

Migrating from the old "Models" tab ​

Provider comparison ​

Providers and models

How to add a provider

Tier mapping

Provider matrix

Cloud providers

Anthropic

OpenAI

Google Gemini

OpenRouter

Azure OpenAI

Amazon Bedrock

Gateway providers

ChatGPT (OAuth)

GitHub Copilot

Kilo Gateway

Local providers

Ollama

LM Studio

Custom endpoint

Migrating from the old "Models" tab

Provider comparison