Database Requirements
Recommended compute, storage, and database sizing for textrawl on Neon, Supabase, or self-hosted Postgres
Choose the right Postgres compute for textrawl. Examples below use Neon tier names (the recommended provider); equivalent Supabase Pro compute add-ons are noted inline. The pgvector/HNSW sizing math is provider-agnostic and applies to any Postgres deployment.
Vector Dimensions by Provider
The embedding model you choose determines your vector dimensions, which directly impacts storage, index size, and RAM requirements.
| Provider | Model | Dimensions | Storage per Vector |
|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | ~6 KB |
| Ollama v1 | nomic-embed-text / mxbai-embed-large | 1024 | ~4 KB |
| Ollama v2 | nomic-embed-text-v2-moe | 768 | ~3 KB |
Lower dimensions = smaller indexes = better performance at every tier.
Required Indexes
Textrawl uses up to 6 HNSW vector indexes (depending on which features are enabled), plus GIN indexes for full-text search and B-tree indexes for lookups.
| Table | Index Type | Purpose |
|---|---|---|
chunks | HNSW | Core semantic search |
memory_entities | HNSW | Entity semantic search |
memory_observations | HNSW | Memory semantic search |
conversation_sessions | HNSW | Conversation semantic search |
conversation_turns | HNSW | Turn-level semantic search |
proactive_insights | HNSW | Insight semantic search |
documents | GIN | Full-text search (tsvector) |
memory_observations | GIN | Full-text search (tsvector) |
conversation_sessions | GIN | Full-text search on summaries |
conversation_turns | GIN | Full-text search on turns |
All HNSW indexes should fit in RAM for optimal query performance. If your HNSW indexes exceed shared_buffers, queries will start hitting disk and latency increases significantly.
Storage Estimates
Estimated storage per document (assuming OpenAI 1536-dimension embeddings):
- 1 document row: ~2 KB
- ~5 chunks average (512 tokens each): ~62 KB per document (including embeddings)
- HNSW index overhead: ~7 KB per vector
| Documents | Chunks (est.) | DB Size (est.) | Fits in 8 GB Free Disk? |
|---|---|---|---|
| 1,000 | ~5,000 | ~70 MB | Yes |
| 10,000 | ~50,000 | ~700 MB | Yes |
| 50,000 | ~250,000 | ~3.5 GB | Yes |
| 100,000 | ~500,000 | ~7 GB | Yes |
| 150,000+ | ~750,000+ | ~10 GB+ | No (storage overage on most providers) |
Compute Tier Recommendations
Pricing accurate as of May 2026 — verify current rates at neon.com/pricing and supabase.com/pricing. On Neon, a Compute Unit (CU) = 1 vCPU + 4 GB RAM, and compute scales (or autoscales) by fractional CU.
Neon Free -- $0/mo
100 CU-hours per project per month, 0.5 GB storage, autoscale up to 2 CU (8 GB RAM peak).
- Suitable for prototyping and light personal use
- Handles up to ~30K vectors comfortably
- HNSW indexes must stay under ~250 MB total
- Supabase equivalent: Free plan + Micro compute (1 GB RAM, $0 with Pro credits)
Neon Launch -- pay-as-you-go ($5/mo minimum)
$0.106/CU-hour compute, $0.35/GB-month storage, autoscale up to 16 CU (64 GB RAM).
- Good for personal knowledge bases under 50K documents (~250K chunks)
- Room for all 6 HNSW indexes at moderate scale
- Supabase equivalent: Pro ($25/mo) + Small or Medium compute add-on (~$15-60/mo)
Neon Scale -- pay-as-you-go ($5/mo minimum, Recommended for production)
$0.222/CU-hour compute, $0.35/GB-month storage, autoscale up to 16 CU or fixed sizes up to 56 CU (224 GB RAM). Adds SLAs, advanced security, and compliance features.
- Best fit for production personal use with all features enabled (memory, conversations, insights)
- Handles 50K-500K documents with consistent performance
- Use a fixed-size compute (4+ CU) for predictable HNSW search latency
- Supabase equivalent: Pro + Large compute add-on (~$110+/mo) or Team plan ($599/mo)
Neon Business -- custom pricing
Dedicated infrastructure, premium support, custom SLAs. For 500K+ documents or multi-tenant deployments. Consider pgvectorscale (DiskANN indexes) at this scale to move vector indexes to SSD instead of RAM.
- Supabase equivalent: Enterprise plan (custom pricing, HIPAA, BYO cloud)
Storage and Disk
Neon storage is $0.35/GB-month with no fixed disk allocation — pay only for what you use. Supabase includes 8 GB of disk on Pro plans, then $0.125/GB. For typical textrawl workloads (read-heavy search + occasional batch writes), either model is fine and disk type/IOPS tuning is not required.
If you self-host, General Purpose SSD (gp3 or equivalent) with ~3,000 IOPS baseline is sufficient. High-performance disk (io2) is only needed for sustained high-throughput multi-tenant workloads.
Diagnosing Your Current Setup
Run these queries via psql "$DATABASE_URL" (or your provider's SQL editor) to understand your current database.
Database Size
Table Sizes
Vector Counts
HNSW Index Sizes
If the chunks table is missing an HNSW index, create it:
Current Compute Tier
| shared_buffers | max_connections | Likely Size |
|---|---|---|
| ~128 MB | ~60 | Free / very small |
| ~256 MB | ~60 | Small |
| ~512 MB | ~90 | Medium |
| ~1 GB | ~120 | Large |
| ~2 GB | ~160 | XL |
| ~4 GB+ | ~240 | XXL / dedicated |
If your total HNSW index size approaches shared_buffers, it's time to upgrade (Neon: increase autoscale ceiling or pick a larger fixed size; Supabase: bump the compute add-on).
Scaling Tips
-
Lower dimensions help the most. Switching from OpenAI 1536d to Ollama v2 768d cuts index size in half. This is the cheapest way to scale.
-
Index bloat happens. Run
REINDEX TABLE <table>;periodically on GIN-indexed tables (documents,memory_observations,conversation_sessions,conversation_turns) if indexes grow disproportionately large relative to data size (10:1+ ratio is a sign of bloat). -
Temporary compute boosts. For large batch imports, temporarily upgrade your compute tier (billed hourly), build HNSW indexes faster, then scale back down.
-
Consider pgvectorscale for 500K+ vectors. DiskANN indexes use SSD instead of RAM, making them much cheaper to scale. pgvectorscale is not currently a native Neon or Supabase extension — it's available on self-hosted Postgres and on Timescale Cloud. For Neon/Supabase deployments you can experiment with it via the pgai CLI. For most personal-scale (sub-500K vector) deployments, plain pgvector HNSW on Neon Launch/Scale is enough.
-
Keep autoscaling and spend limits conservative initially. Textrawl is single-tenant -- you won't get surprise traffic spikes, so there's no reason to leave Neon autoscaling uncapped or Supabase spend caps disabled.