Skip to content

Embedding Providers

Knowmarks supports three embedding providers. Embeddings power the semantic half of hybrid search — they turn text into vectors that capture meaning, so "machine learning deployment" finds articles about "ML in production."

Model locking

The embedding model is locked on first use per collection. All items must use the same model for consistent search. If you need to change models, use the reembed governance action to re-embed all existing items.

fastembed (default)

Local ONNX inference. No API calls, no internet required after the initial model download.

Install:

pip install 'knowmarks[all,embeddings]'

Configuration: None required. Uses BAAI/bge-small-en-v1.5 (384 dimensions) by default.

The model downloads automatically on first use (~130MB). Subsequent runs use the cached model.

Pros: Fully private, no API costs, fast on modern hardware. Cons: Larger install size, slower on machines without AVX2 support.

Ollama

Use any embedding model available through a local Ollama server.

Prerequisites: Ollama installed and running.

Setup:

# Pull an embedding model
ollama pull bge-small-en-v1.5

# Configure Knowmarks
export KNOWMARKS_EMBEDDING_PROVIDER=ollama
export KNOWMARKS_EMBEDDING_ENDPOINT=http://localhost:11434

You can use any Ollama-supported embedding model:

export KNOWMARKS_EMBEDDING_MODEL=nomic-embed-text

Pros: Use any Ollama model, share GPU with other Ollama workloads. Cons: Requires Ollama running, slightly slower than direct ONNX.

OpenAI-Compatible API

Use any API that implements the OpenAI embeddings endpoint format — including OpenAI itself, Azure OpenAI, Together AI, and others.

Setup:

export KNOWMARKS_EMBEDDING_PROVIDER=openai
export KNOWMARKS_EMBEDDING_ENDPOINT=https://api.openai.com/v1
export KNOWMARKS_EMBEDDING_API_KEY=sk-...
export KNOWMARKS_EMBEDDING_MODEL=text-embedding-3-small

Pros: Access to larger, more capable models. Minimal local resources. Cons: Requires internet, API costs, data leaves your machine.

Choosing a Provider

Factor fastembed Ollama OpenAI API
Privacy Full Full Partial
Cost Free Free Per-token
Setup pip install Ollama + pull API key
Speed Fast Fast Network-dependent
Model options Limited Many Many
Internet required First run only No Always