Introduction
What is textrawl and why you need it
textrawl is a personal knowledge server — a second brain that stores your documents, remembers facts across conversations, and proactively discovers connections in your knowledge. Use it through the web dashboard, connect it to AI assistants via MCP, or integrate it into your own tools with the REST API.
What is MCP?
The Model Context Protocol (MCP) is an open standard for connecting AI assistants to external data sources and tools. Adopted by Anthropic for Claude and donated to the Linux Foundation's Agentic AI Foundation, MCP enables:
- Tool Use: Claude can call functions to search, retrieve, and create content
- Context Sharing: Your documents become part of Claude's working knowledge
- Privacy: Data stays on your infrastructure, not uploaded to the cloud
Key Features
Hybrid Search
textrawl combines two search strategies:
- Semantic Search: Understands meaning using vector embeddings (OpenAI or Ollama)
- Full-Text Search: Finds exact keywords using PostgreSQL tsvector
Results are merged using Reciprocal Rank Fusion (RRF), giving you the best of both approaches.
Multi-Format Import
Convert and import various document formats:
- Email: MBOX archives, EML files
- Documents: PDF, DOCX, TXT, Markdown
- Web: HTML pages, saved articles
- Archives: Google Takeout ZIP exports
Persistent Memory
Remember facts, track relationships, and build a knowledge graph:
- Entity Memory: Store facts about people, projects, and concepts
- Conversation Memory: Save and recall conversation context across sessions
- Memory Extraction: Automatically extract entities and facts from text using LLM
Proactive Insights
Automatically discover patterns in your knowledge base:
- Cross-Source Connections: Find links between documents from different sources
- Theme Clusters: Identify recurring themes across your content
- Entity Bridges: Discover entities that connect different topics
- Outliers: Surface unusual or unique content
Automatic Chunking
Large documents are split into searchable chunks:
- 512 tokens (~2048 characters) per chunk
- 50 token overlap for context preservation
- Paragraph-aware splitting (or semantic chunking with
CHUNKING_MODE=semantic)
Web Dashboard
A full-featured command center for managing your knowledge base:
- Knowledge Explorer: Browse documents in card, table, or timeline views
- Agent Orchestration: Trigger insight scans, memory extraction, and briefings
- Applets: Build custom UIs powered by your knowledge using AI-generated code
- Upload: Drag-and-drop file upload with progress tracking and tag management
Multimodal Processing
Go beyond text documents:
- Image Descriptions: Automatically describe images using Claude vision
- Audio Transcription: Transcribe audio files using OpenAI Whisper
- Supported Formats: PNG, JPEG, WebP, GIF, MP3, WAV, M4A, OGG, WebM
REST API & WebSocket
Access your knowledge base beyond MCP clients:
- REST Endpoints: Search, list documents, upload files over HTTP
- WebSocket Events: Real-time notifications for ingestion, extraction, and insight discovery
- Agent Discovery: A2A protocol at
/.well-known/agent.jsonfor agent-to-agent interaction
Self-Hosted
Run textrawl on your infrastructure:
- Local: pnpm dev for development
- Docker: docker-compose for production
- Cloud: Deploy to Cloud Run, Railway, or any container platform
Use Cases
Personal Knowledge Base
Import your notes, saved articles, and documents. Ask Claude questions like:
"What did I write about quarterly planning last October?"
Email Search
Convert MBOX archives to searchable knowledge:
"Find all emails from the marketing team about the product launch"
Research Assistant
Import research papers and documentation:
"Summarize the key findings from the papers about hybrid search algorithms"
Code Documentation
Import your project's documentation and ask Claude for help:
"How do I configure authentication based on our security docs?"
Persistent Memory
Tell your AI facts it should remember across conversations:
"Remember that the Q3 budget was approved at $2.4M and Sarah is leading the rollout."
Next time you ask, it already knows.
Daily Briefing
Start each day with a summary of what changed:
"Give me my daily briefing."
Get recent documents, new insights, and resurfaced knowledge in one view.
Discovering Connections
Let your AI find patterns you missed:
"Are there any connections between my research notes and last week's meeting emails?"
Next Steps
- Quick Start - Get running in 5 minutes
- Installation - Detailed setup guide
- Configuration - Environment variables