Skip to content

LLM Providers

LLM features in Knowmarks are optional. When configured, they enable:

  • Conversational search — Ask questions and get answers grounded in your saved content
  • Auto-summaries — 2-3 sentence distillations of saved items
  • Query expansion — Broader search via LLM-rewritten queries
  • Project refinement — LLM-assisted re-ranking of project associations
  • Relevance explanations — Human-readable reasons for why items match
  • Collection insights — Background analysis of your knowledge base

All LLM features degrade gracefully when no provider is configured — the core save/search/organize workflow works without any LLM.

OpenRouter (default)

The default configuration points to OpenRouter with Gemini 2.5 Flash:

export KM_LLM_URL=https://openrouter.ai/api/v1
export KM_LLM_MODEL=google/gemini-2.5-flash
export KM_LLM_API_KEY=your-openrouter-key

OpenRouter provides access to many models through a single API key. Sign up at openrouter.ai.

LM Studio

For fully local LLM inference:

  1. Install LM Studio
  2. Download a model (e.g., Llama 3.1 8B Instruct)
  3. Start the local server in LM Studio
export KM_LLM_URL=http://localhost:1234/v1
export KM_LLM_MODEL=meta-llama-3.1-8b-instruct

No API key needed for local LM Studio.

Ollama

Use Ollama's OpenAI-compatible endpoint:

export KM_LLM_URL=http://localhost:11434/v1
export KM_LLM_MODEL=llama3.1

Make sure the model is pulled first: ollama pull llama3.1

Any OpenAI-Compatible API

Any service implementing the OpenAI chat completions format works:

export KM_LLM_URL=https://your-provider.com/v1
export KM_LLM_MODEL=your-model-name
export KM_LLM_API_KEY=your-api-key

Disabling LLM Features

To disable all LLM features:

export KM_LLM_ENABLED=0

When disabled, conversational search falls back to standard hybrid search, and features like auto-summaries and query expansion are skipped. The core experience is unaffected.

For best results with Knowmarks:

  • Hosted: Gemini 2.5 Flash (fast, inexpensive, good at structured extraction)
  • Local (high-end): Llama 3.1 8B Instruct or Qwen 2.5 7B
  • Local (lightweight): Phi-3.5 Mini or Gemma 2 2B

Knowmarks uses structured prompts that work well with instruction-tuned models of any size.