Providers & models
Vault Operator supports 12 AI providers. Setup instructions for each one follow.
For all providers, open Settings > Vault Operator > Providers, click "+ Add provider", and pick your provider type.
Cloud providers
Anthropic
| What you need | API key from console.anthropic.com |
| Recommended models | Claude Sonnet 4.6 (best overall), Claude Haiku (fast and cheap) |
| Embedding | Not available natively. Use OpenAI for embeddings. |
Setup:
- Create an account at console.anthropic.com
- Go to API Keys and create a new key
- In Vault Operator, select Anthropic as provider, paste the key, and pick a model
Best tool use
Anthropic models are the most reliable at calling Vault Operator's tools correctly. If quality matters most, start here.
OpenAI
| What you need | API key from platform.openai.com |
| Recommended models | GPT-4o (balanced), o3 (reasoning), GPT-4o-mini (budget) |
| Embedding | Native support. text-embedding-3-small recommended. |
Setup:
- Create an account at platform.openai.com
- Go to API Keys and generate a new key
- In Vault Operator, select OpenAI as provider, paste the key, and pick a model
Embedding models
An OpenAI key also gives you access to embedding models for semantic search. Configure in Settings > Embeddings.
Google Gemini
| What you need | API key from Google AI Studio |
| Recommended models | Gemini 2.5 Flash (fast, free tier available), Gemini 2.5 Pro (best quality) |
| Embedding | Not available natively |
Setup:
- Go to Google AI Studio and sign in with your Google account
- Click Create API Key and copy it
- In Vault Operator, select Google Gemini as provider, paste the key
- Browse available models or pick from the pre-configured list
Free tier
Google Gemini has a free tier with reasonable rate limits. Good starting point if you want to try Vault Operator without paying.
OpenRouter
| What you need | API key from openrouter.ai |
| Recommended models | Any. OpenRouter gives access to 100+ models from multiple providers. |
| Embedding | Not available |
Setup:
- Create an account at openrouter.ai
- Go to Keys and create a new API key
- In Vault Operator, select OpenRouter as provider, paste the key
- Browse or type any model ID (e.g.,
anthropic/claude-sonnet-4.6,google/gemini-2.5-pro)
Azure OpenAI
| What you need | Azure subscription, a deployed model, API key, and endpoint URL |
| Recommended models | GPT-4o (deployed in your Azure region) |
| Embedding | Native support via deployed embedding model |
Setup:
- Deploy a model in your Azure OpenAI resource
- Copy the endpoint URL, API key, and deployment name
- In Vault Operator, select Azure OpenAI as provider and fill in all three fields
Enterprise use
Azure OpenAI fits organizations with compliance requirements. Data stays inside your Azure tenant.
Amazon Bedrock
| What you need | AWS account with Bedrock enabled, IAM user with invoke permissions, access key ID + secret access key |
| Recommended models | Claude Sonnet 4.5, Claude Opus 4.5, Amazon Nova (via cross-region inference profiles) |
| Embedding | Not supported in phase 1. Use OpenAI or Ollama for embeddings |
| Regions | eu-central-1, eu-west-1, eu-west-2, eu-west-3, eu-north-1, us-east-1, us-east-2, us-west-2, plus Asia Pacific |
Setup:
- In the AWS console, open Bedrock in your preferred region. For the EU, Frankfurt (
eu-central-1) is the most common choice - Go to Model access and request access to the model families you want to use. Approval is usually instant for the major foundation models
- Create an IAM user (or role) with a policy that allows these actions:json
{ "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": "*" } - For EU cross-region inference profiles (recommended), the resource ARN pattern covers all EU regions. For a more restricted policy, scope it to the specific inference profile ARNs you use
- Generate an access key ID and secret access key for the user and copy both
- In Vault Operator, select Amazon Bedrock as provider, pick your region, and paste the credentials. Use the quick pick dropdown to select a model
Cross-region inference profiles
Model IDs prefixed with eu. or us. are cross-region inference profiles. They route requests across the regions in that geography for higher availability. In Europe, eu.anthropic.claude-sonnet-4-5-20250929-v1:0 is the recommended default. It works from any EU region and keeps data inside the EU.
Direct regional model IDs (without a prefix) only work in the specific region that hosts the model. Frankfurt supports a smaller direct model list than the EU inference profiles do.
Temporary credentials
For AWS SSO or STS-issued credentials, fill the session token field as well. Long-lived IAM user credentials don't need it.
Billing
Bedrock bills per-token directly through your AWS account. There is no free tier for most foundation models. Check the AWS Bedrock pricing page before heavy use.
Gateway providers
ChatGPT (OAuth)
| What you need | An active ChatGPT Plus or Pro subscription |
| Available models | gpt-5, gpt-5.1, gpt-5.2, gpt-5-codex, gpt-5-codex-mini, gpt-5.1-codex variants, gpt-5.2-codex, gpt-5.3-codex (Codex-backend lineup) |
| Embedding | Not available |
Setup (OAuth PKCE loopback flow, desktop only):
- In Vault Operator, select ChatGPT (OAuth) as provider
- Click "Sign in with ChatGPT". The browser opens with
auth.openai.com. - Sign in with the same account that holds your ChatGPT Plus / Pro subscription
- After approval the browser redirects to a
localhostcallback the plugin opened for the duration of the flow. The tab closes itself. - Click "Refresh" to load the Codex model lineup, then map the tiers (Budget / Main / Frontier)
Behind the scenes the plugin routes requests through chatgpt.com/backend-api/codex/responses, the same endpoint that the Codex CLI uses. Tokens are stored encrypted via your OS keychain (safeStorage). Refresh tokens auto-renew before expiry.
Covered by your subscription
ChatGPT-OAuth bills against your existing Plus / Pro plan, not against an OpenAI API key. There is no per-token cost; rate limits follow the subscription tier. The plugin still tracks the equivalent API cost in the sidebar footer for transparency.
Reasoning effort fixed at low
GPT-5 family models require a reasoning block in every request. Vault Operator sends reasoning: { effort: 'low' }, the narrowest value accepted across the family. This minimises latency and cost. Higher reasoning effort is not currently exposed as a setting; if you need it for a specific task, use the OpenAI API provider with a gpt-5*-pro model via the standard /v1/responses endpoint.
GitHub Copilot
| What you need | An active GitHub Copilot subscription (Individual, Business, or Enterprise) |
| Recommended models | GPT-4o, Claude Sonnet (available through Copilot) |
| Embedding | Not available |
Setup (OAuth device flow):
- In Vault Operator, select GitHub Copilot as provider
- Click "Sign in with GitHub". A device code appears.
- Open github.com/login/device in your browser
- Enter the code and authorize the app
- Vault Operator automatically detects your available models
No extra cost
If you already pay for GitHub Copilot, this costs nothing extra. The models come with your subscription.
Kilo Gateway
| What you need | A Kilo Code account with gateway access |
| Recommended models | Depends on your organization's available models |
| Embedding | Not available |
Setup (device auth, recommended):
- In Vault Operator, select Kilo Gateway as provider
- Click "Sign in". A device code and URL appear.
- Open the URL in your browser, enter the code, and authorize
- Models are loaded dynamically from your organization
Setup (manual token):
- Obtain a gateway token from your Kilo Code admin
- In Vault Operator, select Kilo Gateway and choose "Manual Token"
- Paste the token. Models load automatically.
Local providers
Ollama
| What you need | Ollama installed on your machine |
| Recommended models | Qwen 2.5 7B (balanced), Llama 3.2 (general), Codestral (code) |
| Embedding | Supported via nomic-embed-text or similar |
Setup:
- Install Ollama from ollama.ai
- Pull a model:
ollama pull qwen2.5:7b - In Vault Operator, select Ollama as provider. No API key needed.
- The Base URL field pre-fills with
http://localhost:11434; adjust only if you run Ollama on a non-default port. - Click "Refresh" to populate the model list from Ollama's native
/api/tagsendpoint.
Privacy
With Ollama, no data leaves your machine. Good for sensitive vaults.
LM Studio
| What you need | LM Studio installed with a model loaded |
| Recommended models | Any GGUF model from the built-in catalog |
| Embedding | Supported for compatible models |
Setup:
- Install LM Studio from lmstudio.ai
- Download a model from the catalog and load it
- Start the local server (LM Studio > Developer tab)
- In Vault Operator, select LM Studio as provider. No API key needed.
- The Base URL field pre-fills with
http://localhost:1234; adjust only if you changed the server port.
Custom endpoint
| What you need | Any OpenAI-compatible API endpoint |
| Recommended models | Depends on the server |
| Embedding | Depends on the server |
Setup:
- In Vault Operator, select Custom as provider
- Enter the base URL (e.g.,
http://localhost:8080/v1) - Enter an API key if your server requires one
- Type the model name exactly as the server expects
This works with any server that implements the OpenAI chat completions API: vLLM, text-generation-inference, LocalAI, and self-hosted endpoints.
Provider comparison
| Provider | Auth | Cost | Privacy | Embedding | Best for |
|---|---|---|---|---|---|
| Anthropic | API key | Pay-per-use | Cloud | No | Best quality |
| OpenAI | API key | Pay-per-use | Cloud | Yes | Structured output, embeddings |
| Google Gemini | API key | Free tier + pay | Cloud | No | Free starting point |
| OpenRouter | API key | Pay-per-use | Cloud | No | Model variety |
| Azure OpenAI | API key + endpoint | Enterprise | Enterprise tenant | Yes | Compliance |
| Amazon Bedrock | IAM access key | Pay-per-use via AWS | Cloud (your AWS account) | No | EU data residency via eu-central-1 |
| ChatGPT (OAuth) | OAuth (PKCE) | Plus / Pro subscription | Cloud | No | Existing ChatGPT subscribers, Codex-line models |
| GitHub Copilot | OAuth | Subscription | Cloud | No | Existing subscribers |
| Kilo Gateway | Device auth / token | Organization | Cloud | No | Team deployments |
| Ollama | None | Free | Fully local | Yes | Privacy, offline |
| LM Studio | None | Free | Fully local | Yes | Visual model browser |
| Custom | Varies | Varies | Varies | Varies | Self-hosted setups |
