Architecture
Embedding Providers
OpenAI vs Ollama for vector embeddings
textrawl supports two embedding providers: OpenAI (cloud) and Ollama (local).
Provider Comparison
| Feature | OpenAI | Ollama |
|---|---|---|
| Location | Cloud | Local |
| Privacy | Data sent to API | Data stays local |
| Dimensions | 1536 | 1024 |
| Model | text-embedding-3-small | nomic-embed-text |
| Cost | Pay per token | Free |
| Speed | ~100ms/request | ~50ms/request |
| Setup | API key | Docker container |
OpenAI Setup
Use the database schema for 1536 dimensions:
Ollama Setup
Configure environment:
Use the database schema for 1024 dimensions:
Switching Providers
Warning: You cannot mix providers. Switching requires re-embedding all documents.
To switch:
- Export your documents (or keep source files)
- Drop existing tables
- Run new schema (different dimensions)
- Re-upload all documents
Quality Comparison
Both providers offer high-quality embeddings suitable for semantic search:
| Benchmark | OpenAI | nomic-embed-text |
|---|---|---|
| MTEB Average | 62.3 | 59.4 |
| Retrieval | 54.9 | 52.8 |
| Clustering | 45.0 | 44.2 |
OpenAI has a slight edge, but Ollama offers privacy and no API costs.
Recommendations
Choose OpenAI if:
- You need maximum quality
- Data privacy isn't critical
- You prefer managed infrastructure
Choose Ollama if:
- Data must stay local
- You want to avoid API costs
- You have GPU resources
- You're running fully self-hosted