textrawl
byJeff Green
Getting Started

Introduction

What is textrawl and why you need it

textrawl is a Personal Knowledge MCP Server that gives Claude access to your documents, emails, notes, and other knowledge. It uses hybrid search combining semantic understanding with keyword matching to find the most relevant content.

What is MCP?

The Model Context Protocol (MCP) is an open standard for connecting AI assistants to external data sources and tools. Adopted by Anthropic for Claude and donated to the Linux Foundation's Agentic AI Foundation, MCP enables:

  • Tool Use: Claude can call functions to search, retrieve, and create content
  • Context Sharing: Your documents become part of Claude's working knowledge
  • Privacy: Data stays on your infrastructure, not uploaded to the cloud

Key Features

textrawl combines two search strategies:

  1. Semantic Search: Understands meaning using vector embeddings (OpenAI or Ollama)
  2. Full-Text Search: Finds exact keywords using PostgreSQL tsvector

Results are merged using Reciprocal Rank Fusion (RRF), giving you the best of both approaches.

Multi-Format Import

Convert and import various document formats:

  • Email: MBOX archives, EML files
  • Documents: PDF, DOCX, TXT, Markdown
  • Web: HTML pages, saved articles
  • Archives: Google Takeout ZIP exports

Automatic Chunking

Large documents are split into searchable chunks:

  • 512 tokens (~2048 characters) per chunk
  • 50 token overlap for context preservation
  • Paragraph-aware splitting

Self-Hosted

Run textrawl on your infrastructure:

  • Local: npm run dev for development
  • Docker: docker-compose for production
  • Cloud: Deploy to Cloud Run, Railway, or any container platform

Use Cases

Personal Knowledge Base

Import your notes, saved articles, and documents. Ask Claude questions like:

"What did I write about quarterly planning last October?"

Convert MBOX archives to searchable knowledge:

"Find all emails from the marketing team about the product launch"

Research Assistant

Import research papers and documentation:

"Summarize the key findings from the papers about hybrid search algorithms"

Code Documentation

Import your project's documentation and ask Claude for help:

"How do I configure authentication based on our security docs?"

Next Steps

  1. Quick Start - Get running in 5 minutes
  2. Installation - Detailed setup guide
  3. Configuration - Environment variables

On this page