Embedding Providers¶

Knowmarks supports three embedding providers. Embeddings power the semantic half of hybrid search — they turn text into vectors that capture meaning, so "machine learning deployment" finds articles about "ML in production."

Model locking

The embedding model is locked on first use per collection. All items must use the same model for consistent search. If you need to change models, use the reembed governance action to re-embed all existing items.

fastembed (default)¶

Local ONNX inference. No API calls, no internet required after the initial model download.

Install:

pip install 'knowmarks[all,embeddings]'

Configuration: None required. Uses BAAI/bge-small-en-v1.5 (384 dimensions) by default.

The model downloads automatically on first use (~130MB). Subsequent runs use the cached model.

Pros: Fully private, no API costs, fast on modern hardware. Cons: Larger install size, slower on machines without AVX2 support.

Ollama¶

Use any embedding model available through a local Ollama server.

Prerequisites: Ollama installed and running.

Setup:

# Pull an embedding model
ollama pull bge-small-en-v1.5

# Configure Knowmarks
export KNOWMARKS_EMBEDDING_PROVIDER=ollama
export KNOWMARKS_EMBEDDING_ENDPOINT=http://localhost:11434

You can use any Ollama-supported embedding model:

export KNOWMARKS_EMBEDDING_MODEL=nomic-embed-text

Pros: Use any Ollama model, share GPU with other Ollama workloads. Cons: Requires Ollama running, slightly slower than direct ONNX.

OpenAI-Compatible API¶

Use any API that implements the OpenAI embeddings endpoint format — including OpenAI itself, Azure OpenAI, Together AI, and others.

Setup:

export KNOWMARKS_EMBEDDING_PROVIDER=openai
export KNOWMARKS_EMBEDDING_ENDPOINT=https://api.openai.com/v1
export KNOWMARKS_EMBEDDING_API_KEY=sk-...
export KNOWMARKS_EMBEDDING_MODEL=text-embedding-3-small

Pros: Access to larger, more capable models. Minimal local resources. Cons: Requires internet, API costs, data leaves your machine.

Choosing a Provider¶

Factor	fastembed	Ollama	OpenAI API
Privacy	Full	Full	Partial
Cost	Free	Free	Per-token
Setup	pip install	Ollama + pull	API key
Speed	Fast	Fast	Network-dependent
Model options	Limited	Many	Many
Internet required	First run only	No	Always