Embedding Providers
OpenAI, Google AI, and Ollama for vector embeddings
textrawl supports three embedding providers: OpenAI (cloud), Google AI (cloud), and Ollama (local).
Provider Comparison
| Feature | OpenAI | Google AI | Ollama |
|---|---|---|---|
| Location | Cloud | Cloud | Local |
| Privacy | Data sent to API | Data sent to API | Data stays local |
| Dimensions | 1536 | 768 | 1024 or 768 |
| Model | text-embedding-3-small | text-embedding-004 | nomic-embed-text / v2-moe |
| Cost | Pay per token | Free tier available | Free |
| Speed | ~100ms/request | ~80ms/request | ~50ms/request |
| Setup | API key | API key | Docker container |
OpenAI Setup
Use the database schema for 1536 dimensions:
Google AI Setup
Use the database schema for 768 dimensions:
Ollama Setup
Configure environment:
Use the database schema for 1024 dimensions:
Switching Providers
Warning: You cannot mix providers. Switching requires re-embedding all documents.
To switch:
- Export your documents (or keep source files)
- Drop existing tables
- Run new schema (different dimensions)
- Re-upload all documents
Quality Comparison
The table below compares published MTEB benchmarks for OpenAI and Ollama's nomic-embed-text. Google AI's text-embedding-004 performs comparably on standard retrieval benchmarks but is not included due to differing evaluation methodology.
| Benchmark | OpenAI | nomic-embed-text |
|---|---|---|
| MTEB Average | 62.3 | 59.4 |
| Retrieval | 54.9 | 52.8 |
| Clustering | 45.0 | 44.2 |
OpenAI has a slight edge in benchmarks, but Ollama offers privacy and no API costs. Google AI offers a good balance with a free tier and competitive retrieval quality.
Recommendations
Choose OpenAI if:
- You need maximum quality
- Data privacy isn't critical
- You prefer managed infrastructure
Choose Google AI if:
- You want cloud convenience with a free tier
- 768-dimension embeddings are sufficient
- You already use Google Cloud
Choose Ollama if:
- Data must stay local
- You want to avoid API costs
- You have GPU resources
- You're running fully self-hosted