textrawl
byJeff Green
Guides

Database Requirements

Recommended compute, storage, and database sizing for textrawl on Neon, Supabase, or self-hosted Postgres

Choose the right Postgres compute for textrawl. Examples below use Neon tier names (the recommended provider); equivalent Supabase Pro compute add-ons are noted inline. The pgvector/HNSW sizing math is provider-agnostic and applies to any Postgres deployment.

Vector Dimensions by Provider

The embedding model you choose determines your vector dimensions, which directly impacts storage, index size, and RAM requirements.

ProviderModelDimensionsStorage per Vector
OpenAItext-embedding-3-small1536~6 KB
Ollama v1nomic-embed-text / mxbai-embed-large1024~4 KB
Ollama v2nomic-embed-text-v2-moe768~3 KB

Lower dimensions = smaller indexes = better performance at every tier.

Required Indexes

Textrawl uses up to 6 HNSW vector indexes (depending on which features are enabled), plus GIN indexes for full-text search and B-tree indexes for lookups.

TableIndex TypePurpose
chunksHNSWCore semantic search
memory_entitiesHNSWEntity semantic search
memory_observationsHNSWMemory semantic search
conversation_sessionsHNSWConversation semantic search
conversation_turnsHNSWTurn-level semantic search
proactive_insightsHNSWInsight semantic search
documentsGINFull-text search (tsvector)
memory_observationsGINFull-text search (tsvector)
conversation_sessionsGINFull-text search on summaries
conversation_turnsGINFull-text search on turns

All HNSW indexes should fit in RAM for optimal query performance. If your HNSW indexes exceed shared_buffers, queries will start hitting disk and latency increases significantly.

Storage Estimates

Estimated storage per document (assuming OpenAI 1536-dimension embeddings):

  • 1 document row: ~2 KB
  • ~5 chunks average (512 tokens each): ~62 KB per document (including embeddings)
  • HNSW index overhead: ~7 KB per vector
DocumentsChunks (est.)DB Size (est.)Fits in 8 GB Free Disk?
1,000~5,000~70 MBYes
10,000~50,000~700 MBYes
50,000~250,000~3.5 GBYes
100,000~500,000~7 GBYes
150,000+~750,000+~10 GB+No (storage overage on most providers)

Compute Tier Recommendations

Pricing accurate as of May 2026 — verify current rates at neon.com/pricing and supabase.com/pricing. On Neon, a Compute Unit (CU) = 1 vCPU + 4 GB RAM, and compute scales (or autoscales) by fractional CU.

Neon Free -- $0/mo

100 CU-hours per project per month, 0.5 GB storage, autoscale up to 2 CU (8 GB RAM peak).

  • Suitable for prototyping and light personal use
  • Handles up to ~30K vectors comfortably
  • HNSW indexes must stay under ~250 MB total
  • Supabase equivalent: Free plan + Micro compute (1 GB RAM, $0 with Pro credits)

Neon Launch -- pay-as-you-go ($5/mo minimum)

$0.106/CU-hour compute, $0.35/GB-month storage, autoscale up to 16 CU (64 GB RAM).

  • Good for personal knowledge bases under 50K documents (~250K chunks)
  • Room for all 6 HNSW indexes at moderate scale
  • Supabase equivalent: Pro ($25/mo) + Small or Medium compute add-on (~$15-60/mo)

$0.222/CU-hour compute, $0.35/GB-month storage, autoscale up to 16 CU or fixed sizes up to 56 CU (224 GB RAM). Adds SLAs, advanced security, and compliance features.

  • Best fit for production personal use with all features enabled (memory, conversations, insights)
  • Handles 50K-500K documents with consistent performance
  • Use a fixed-size compute (4+ CU) for predictable HNSW search latency
  • Supabase equivalent: Pro + Large compute add-on (~$110+/mo) or Team plan ($599/mo)

Neon Business -- custom pricing

Dedicated infrastructure, premium support, custom SLAs. For 500K+ documents or multi-tenant deployments. Consider pgvectorscale (DiskANN indexes) at this scale to move vector indexes to SSD instead of RAM.

  • Supabase equivalent: Enterprise plan (custom pricing, HIPAA, BYO cloud)

Storage and Disk

Neon storage is $0.35/GB-month with no fixed disk allocation — pay only for what you use. Supabase includes 8 GB of disk on Pro plans, then $0.125/GB. For typical textrawl workloads (read-heavy search + occasional batch writes), either model is fine and disk type/IOPS tuning is not required.

If you self-host, General Purpose SSD (gp3 or equivalent) with ~3,000 IOPS baseline is sufficient. High-performance disk (io2) is only needed for sustained high-throughput multi-tenant workloads.

Diagnosing Your Current Setup

Run these queries via psql "$DATABASE_URL" (or your provider's SQL editor) to understand your current database.

Database Size

SELECT pg_size_pretty(pg_database_size(current_database())) AS total_db_size;

Table Sizes

SELECT
  schemaname || '.' || tablename AS table,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS data_size,
  pg_size_pretty(
    pg_total_relation_size(schemaname || '.' || tablename)
    - pg_relation_size(schemaname || '.' || tablename)
  ) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
  AND tablename IN (
    'documents', 'chunks',
    'memory_entities', 'memory_observations', 'memory_relations',
    'conversation_sessions', 'conversation_turns',
    'proactive_insights', 'insight_queue'
  )
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Vector Counts

SELECT 'chunks' AS table_name, count(*) AS total_rows, count(embedding) AS vectors
FROM chunks
UNION ALL
SELECT 'memory_entities', count(*), count(embedding) FROM memory_entities
UNION ALL
SELECT 'memory_observations', count(*), count(embedding) FROM memory_observations
UNION ALL
SELECT 'conversation_sessions', count(*), count(summary_embedding) FROM conversation_sessions
UNION ALL
SELECT 'conversation_turns', count(*), count(embedding) FROM conversation_turns
UNION ALL
SELECT 'proactive_insights', count(*), count(embedding) FROM proactive_insights
ORDER BY vectors DESC;

HNSW Index Sizes

SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE schemaname = 'public'
  AND indexdef ILIKE '%hnsw%'
ORDER BY pg_relation_size(indexname::regclass) DESC;

If the chunks table is missing an HNSW index, create it:

-- Works for all providers (OpenAI 1536d, Ollama v1 1024d, Ollama v2 768d)
-- PostgreSQL infers the dimension from the column type
CREATE INDEX IF NOT EXISTS chunks_embedding_idx ON chunks
USING hnsw (embedding vector_cosine_ops);

Current Compute Tier

SHOW shared_buffers;
SHOW effective_cache_size;
SHOW max_connections;
shared_buffersmax_connectionsLikely Size
~128 MB~60Free / very small
~256 MB~60Small
~512 MB~90Medium
~1 GB~120Large
~2 GB~160XL
~4 GB+~240XXL / dedicated

If your total HNSW index size approaches shared_buffers, it's time to upgrade (Neon: increase autoscale ceiling or pick a larger fixed size; Supabase: bump the compute add-on).

Scaling Tips

  1. Lower dimensions help the most. Switching from OpenAI 1536d to Ollama v2 768d cuts index size in half. This is the cheapest way to scale.

  2. Index bloat happens. Run REINDEX TABLE <table>; periodically on GIN-indexed tables (documents, memory_observations, conversation_sessions, conversation_turns) if indexes grow disproportionately large relative to data size (10:1+ ratio is a sign of bloat).

  3. Temporary compute boosts. For large batch imports, temporarily upgrade your compute tier (billed hourly), build HNSW indexes faster, then scale back down.

  4. Consider pgvectorscale for 500K+ vectors. DiskANN indexes use SSD instead of RAM, making them much cheaper to scale. pgvectorscale is not currently a native Neon or Supabase extension — it's available on self-hosted Postgres and on Timescale Cloud. For Neon/Supabase deployments you can experiment with it via the pgai CLI. For most personal-scale (sub-500K vector) deployments, plain pgvector HNSW on Neon Launch/Scale is enough.

  5. Keep autoscaling and spend limits conservative initially. Textrawl is single-tenant -- you won't get surprise traffic spikes, so there's no reason to leave Neon autoscaling uncapped or Supabase spend caps disabled.