LLM Providers¶
LLM features in Knowmarks are optional. When configured, they enable:
- Conversational search — Ask questions and get answers grounded in your saved content
- Auto-summaries — 2-3 sentence distillations of saved items
- Query expansion — Broader search via LLM-rewritten queries
- Project refinement — LLM-assisted re-ranking of project associations
- Relevance explanations — Human-readable reasons for why items match
- Collection insights — Background analysis of your knowledge base
All LLM features degrade gracefully when no provider is configured — the core save/search/organize workflow works without any LLM.
OpenRouter (default)¶
The default configuration points to OpenRouter with Gemini 2.5 Flash:
export KM_LLM_URL=https://openrouter.ai/api/v1
export KM_LLM_MODEL=google/gemini-2.5-flash
export KM_LLM_API_KEY=your-openrouter-key
OpenRouter provides access to many models through a single API key. Sign up at openrouter.ai.
LM Studio¶
For fully local LLM inference:
- Install LM Studio
- Download a model (e.g., Llama 3.1 8B Instruct)
- Start the local server in LM Studio
export KM_LLM_URL=http://localhost:1234/v1
export KM_LLM_MODEL=meta-llama-3.1-8b-instruct
No API key needed for local LM Studio.
Ollama¶
Use Ollama's OpenAI-compatible endpoint:
export KM_LLM_URL=http://localhost:11434/v1
export KM_LLM_MODEL=llama3.1
Make sure the model is pulled first: ollama pull llama3.1
Any OpenAI-Compatible API¶
Any service implementing the OpenAI chat completions format works:
export KM_LLM_URL=https://your-provider.com/v1
export KM_LLM_MODEL=your-model-name
export KM_LLM_API_KEY=your-api-key
Disabling LLM Features¶
To disable all LLM features:
export KM_LLM_ENABLED=0
When disabled, conversational search falls back to standard hybrid search, and features like auto-summaries and query expansion are skipped. The core experience is unaffected.
Recommended Models¶
For best results with Knowmarks:
- Hosted: Gemini 2.5 Flash (fast, inexpensive, good at structured extraction)
- Local (high-end): Llama 3.1 8B Instruct or Qwen 2.5 7B
- Local (lightweight): Phi-3.5 Mini or Gemma 2 2B
Knowmarks uses structured prompts that work well with instruction-tuned models of any size.