Architecture

This section covers the technical details of how textrawl works under the hood.

Architecture Overview

textrawl is built on a modern stack optimized for semantic search:

Runtime: Node.js 22+ with ES modules
Database: Neon PostgreSQL with pgvector extension (any Postgres with pgvector works)
Transport: HTTP (REST API, MCP over StreamableHTTP, WebSocket events)
Embeddings: OpenAI, Google AI, or Ollama (configurable)

Request Flow

 MCP Client       Dashboard      REST Client       CLI
     │                │               │              │
     ▼                ▼               ▼              ▼
 POST /mcp      GET/POST /api   GET/POST /api   POST /api
     └────────────────┼───────────────┘──────────────┘
                      ▼
               ┌─────────────┐
               │   Express   │
               │   Server    │
               └──────┬──────┘
                      │
                      ▼
               ┌─────────────┐
               │    Tool     │
               │  Handlers   │
               └──────┬──────┘
                      │
                ┌─────┴─────┐
                ▼           ▼
           ┌─────────┐ ┌─────────┐
           │ Postgres│ │ OpenAI/ │
           │   DB    │ │ Ollama  │
           └─────────┘ └─────────┘

Core Components

Hybrid Search

The search system combines full-text keyword matching with semantic vector similarity using Reciprocal Rank Fusion (RRF).

Learn about hybrid search →

Text Chunking

Documents are split into overlapping chunks to fit within embedding model context limits while preserving semantic coherence.

512 tokens (~2048 characters) per chunk
50 token overlap for context preservation
Paragraph-aware splitting on \n\n

Learn about chunking →

Embeddings

Vector representations that capture semantic meaning, enabling similarity search.

Provider	Model	Dimensions
OpenAI	text-embedding-3-small	1536
Google AI	gemini-embedding-2-preview	3072
Ollama	nomic-embed-text	1024
Ollama	nomic-embed-text-v2-moe	768

Learn about embeddings →

Database Schema

Core Tables

Table	Purpose
`documents`	Document metadata, content, tags
`chunks`	Text chunks with embeddings and FTS vectors

Memory Tables (Optional)

Table	Purpose
`memory_entities`	Named entities (people, projects, concepts)
`memory_observations`	Facts about entities with embeddings
`memory_relations`	Directed relationships between entities

Key Indexes

Index	Type	Purpose
`chunks.search_vector`	GIN	Full-text search
`chunks.embedding`	HNSW	Vector similarity
`memory_observations.embedding`	HNSW	Memory search

Transport Layer

textrawl exposes four transport interfaces: MCP (for AI assistants), REST API (for programmatic access), WebSocket (for real-time events), and the A2A protocol (for agent-to-agent discovery).