Embeddings

Embeddings are numerical representations of text that capture semantic meaning. textrawl uses them to enable semantic search.

What Are Embeddings?

An embedding converts text into a high-dimensional vector (array of numbers). Similar texts have similar vectors, enabling "search by meaning."

"How do I configure auth?"  →  [0.12, -0.34, 0.56, ...]
"Setting up authentication" →  [0.11, -0.35, 0.55, ...]  (similar!)
"The weather is nice"       →  [0.89, 0.23, -0.45, ...]  (different)

Embedding Models

textrawl supports multiple embedding providers:

Provider	Model	Dimensions	Notes
OpenAI	text-embedding-3-small	1536	Cloud, fast
Ollama	nomic-embed-text	1024	Local, private

Important: Embedding providers cannot be mixed. OpenAI and Ollama use different embedding dimensions (1536 vs 1024), making them incompatible. Switching providers requires re-embedding all documents in your knowledge base.

How It Works

Chunk: Document split into ~512 token pieces
Embed: Each chunk sent to embedding API
Store: Vectors stored in pgvector column
Index: HNSW index for fast similarity search

Query Time

Query text embedded using same model
pgvector finds similar chunks by cosine distance
Results combined with full-text search via RRF

Configuration

# OpenAI (default)
EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=sk-...
 
# Ollama (local)
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text

What Are Embeddings?

Embedding Models

How It Works

Query Time

Configuration

On this page