textrawl
byJeff Green

Core Concepts

Foundational concepts behind textrawl's knowledge base

Understanding these core concepts will help you get the most out of textrawl's knowledge base capabilities.

Concepts Overview

textrawl combines two search strategies for best results:

  • Keyword search finds exact matches using PostgreSQL's full-text search
  • Semantic search finds meaning using vector embeddings
  • Hybrid search combines both using Reciprocal Rank Fusion (RRF)

Learn about search →

Embeddings

Vector embeddings are numerical representations of text that capture semantic meaning:

  • OpenAI's text-embedding-3-small (1536 dimensions)
  • Ollama's nomic-embed-text (1024 dimensions)
  • Stored in PostgreSQL with pgvector

Learn about embeddings →

Document Processing

How textrawl processes your documents:

  • Crawling discovers and extracts content from various formats
  • Chunking splits documents into searchable pieces
  • Indexing creates embeddings and full-text search vectors

Learn about crawling →

How It All Fits Together

Documents (PDF, MBOX, HTML, MD)


    ┌─────────┐
    │ Crawl & │
    │ Extract │
    └────┬────┘


    ┌─────────┐
    │  Chunk  │
    │  (512t) │
    └────┬────┘

         ├──────────────────┐
         ▼                  ▼
    ┌─────────┐       ┌──────────┐
    │ Embed   │       │ Full-Text│
    │ (vector)│       │ (tsvector)│
    └────┬────┘       └─────┬────┘
         │                  │
         └────────┬─────────┘

           ┌──────────┐
           │ Supabase │
           │ (pgvector)│
           └──────────┘

Next Steps

On this page