Embedding Providers¶
Knowmarks supports three embedding providers. Embeddings power the semantic half of hybrid search — they turn text into vectors that capture meaning, so "machine learning deployment" finds articles about "ML in production."
Model locking
The embedding model is locked on first use per collection. All items must use the same model for consistent search. If you need to change models, use the reembed governance action to re-embed all existing items.
fastembed (default)¶
Local ONNX inference. No API calls, no internet required after the initial model download.
Install:
pip install 'knowmarks[all,embeddings]'
Configuration: None required. Uses BAAI/bge-small-en-v1.5 (384 dimensions) by default.
The model downloads automatically on first use (~130MB). Subsequent runs use the cached model.
Pros: Fully private, no API costs, fast on modern hardware. Cons: Larger install size, slower on machines without AVX2 support.
Ollama¶
Use any embedding model available through a local Ollama server.
Prerequisites: Ollama installed and running.
Setup:
# Pull an embedding model
ollama pull bge-small-en-v1.5
# Configure Knowmarks
export KNOWMARKS_EMBEDDING_PROVIDER=ollama
export KNOWMARKS_EMBEDDING_ENDPOINT=http://localhost:11434
You can use any Ollama-supported embedding model:
export KNOWMARKS_EMBEDDING_MODEL=nomic-embed-text
Pros: Use any Ollama model, share GPU with other Ollama workloads. Cons: Requires Ollama running, slightly slower than direct ONNX.
OpenAI-Compatible API¶
Use any API that implements the OpenAI embeddings endpoint format — including OpenAI itself, Azure OpenAI, Together AI, and others.
Setup:
export KNOWMARKS_EMBEDDING_PROVIDER=openai
export KNOWMARKS_EMBEDDING_ENDPOINT=https://api.openai.com/v1
export KNOWMARKS_EMBEDDING_API_KEY=sk-...
export KNOWMARKS_EMBEDDING_MODEL=text-embedding-3-small
Pros: Access to larger, more capable models. Minimal local resources. Cons: Requires internet, API costs, data leaves your machine.
Choosing a Provider¶
| Factor | fastembed | Ollama | OpenAI API |
|---|---|---|---|
| Privacy | Full | Full | Partial |
| Cost | Free | Free | Per-token |
| Setup | pip install | Ollama + pull | API key |
| Speed | Fast | Fast | Network-dependent |
| Model options | Limited | Many | Many |
| Internet required | First run only | No | Always |