textrawl
byJeff Green

search

Hybrid semantic + full-text search with optional cross-source fusion

Hybrid search combining semantic similarity with full-text keyword matching using Reciprocal Rank Fusion (RRF). Optionally search across entity memories and past conversations with weighted fusion.

Parameters

ParameterTypeRequiredDefaultDescription
querystringYes-Natural language search query (1-10000 chars)
limitnumberNo5Maximum results to return (1-50)
fullTextWeightnumberNo1.0Weight for keyword matching (0-2)
semanticWeightnumberNo1.0Weight for semantic similarity (0-2)
tagsstring[]No-Filter to documents with ALL specified tags
sourceTypeenumNo-Filter by source: note, file, or url
contentTypeenumNo-Filter by content type: email, youtube, calendar, contact, webpage, document
minScorenumberNo-Minimum relevance score threshold (0-1)
includeMemoriesbooleanNofalseAlso search entity memories (requires ENABLE_MEMORY)
includeConversationsbooleanNofalseAlso search past conversations (requires ENABLE_CONVERSATIONS)
memoryWeightnumberNo1.0Weight for memory results when includeMemories=true (0-2)
conversationWeightnumberNo0.5Weight for conversation results when includeConversations=true (0-2)

Example Request

Document-Only Search (Default)

{
  "query": "quarterly planning meeting notes",
  "limit": 5,
  "fullTextWeight": 1.0,
  "semanticWeight": 1.5,
  "tags": ["work", "planning"],
  "sourceType": "note",
  "minScore": 0.5
}
{
  "query": "database architecture decisions",
  "includeMemories": true,
  "includeConversations": true,
  "memoryWeight": 1.5,
  "limit": 10
}

Response

Document-Only Response

{
  "query": "quarterly planning meeting notes",
  "totalResults": 3,
  "results": [
    {
      "documentId": "550e8400-e29b-41d4-a716-446655440000",
      "documentTitle": "Q4 Planning Notes",
      "sourceType": "note",
      "tags": ["work", "planning", "q4"],
      "chunkId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
      "content": "In the Q4 planning meeting, we discussed the roadmap for...",
      "score": 0.89
    }
  ]
}

Cross-Source Response

When includeMemories or includeConversations is true, results are fused by score:

{
  "query": "database architecture decisions",
  "counts": { "documents": 3, "memories": 2, "conversations": 1 },
  "results": [
    { "type": "document", "score": 0.91, "data": { "id": "...", "title": "DB Design Doc", "content": "..." } },
    { "type": "memory", "score": 0.85, "data": { "entityName": "Database", "content": "Uses PostgreSQL with pgvector" } },
    { "type": "conversation", "score": 0.42, "data": { "sessionKey": "db-planning", "title": "DB Planning Session" } }
  ]
}

Output Schema

This tool MUST return structuredContent alongside the text response. The structuredContent object MUST use canonical verbose keys regardless of the COMPACT_RESPONSES setting.

FieldTypeDescription
querystringThe search query
totalResultsintegerNumber of results returned
resultsarrayArray of result objects
results[].typeenumdocument, memory, or conversation
results[].scorenumberRelevance score
results[].documentIdstringDocument UUID (document results)
results[].documentTitlestringDocument title (document results)
results[].sourceTypestringSource type (document results)
results[].tagsstring[]Document tags (document results)
results[].chunkIdstringChunk UUID (document results)
results[].contentstringContent snippet (truncated to 500 chars)
results[].entityNamestringEntity name (memory results)
results[].entityTypestringEntity type (memory results)
results[].sessionIdstringSession ID (conversation results)
results[].sessionKeystring?Session key (conversation results)
results[].titlestring?Session title (conversation results)
results[].summarystring?Session summary (conversation results)
countsobjectSource counts (cross-source only)
counts.documentsintegerNumber of document results
counts.memoriesintegerNumber of memory results
counts.conversationsintegerNumber of conversation results

How It Works

Reciprocal Rank Fusion (RRF)

textrawl runs two parallel searches:

  1. Full-Text Search: PostgreSQL tsvector matching for exact keywords
  2. Semantic Search: pgvector cosine similarity for meaning

Results are combined using RRF:

RRF_score = (fullTextWeight / (k + fts_rank)) + (semanticWeight / (k + semantic_rank))

Where k = 60 (standard RRF constant).

Weight Tuning

Use CasefullTextWeightsemanticWeight
Exact phrases1.5-2.00.5-1.0
Conceptual search0.5-1.01.5-2.0
Balanced (default)1.01.0
Keyword-only2.00
Semantic-only02.0

Tip: Start with default weights and adjust based on results. Higher semantic weight helps with paraphrased queries.

Filtering

By Tags

Filter to documents containing ALL specified tags:

{
  "query": "project update",
  "tags": ["work", "project-x"]
}

Only returns documents tagged with both work AND project-x.

By Source Type

sourceTypeDescription
noteCreated via add_note tool
fileUploaded via CLI or Web UI
urlWeb content (future feature)

By Content Type

contentTypeDescription
emailEmail messages
youtubeYouTube watch history
calendarCalendar events
contactContacts
webpageSaved web pages
documentGeneral documents

By Score Threshold

Filter out low-relevance results:

{
  "query": "specific topic",
  "minScore": 0.7
}

Caution: Scores are relative. A 0.7 threshold may filter out relevant results for broad queries.

Error Responses

ErrorCauseFix
Database not configuredMissing Supabase credentialsSet SUPABASE_URL and SUPABASE_SERVICE_KEY
Embedding provider not configuredMissing API keySet OPENAI_API_KEY or configure Ollama
Search failedDatabase or embedding errorCheck connectivity and logs

Best Practices

  1. Start broad, then narrow: Begin without filters, add them if too many results
  2. Use semantic weight for questions: Increase semanticWeight for natural language queries
  3. Use full-text weight for keywords: Increase fullTextWeight for specific terms
  4. Combine with get_document: Search returns chunks, use get_document for full context
  5. Use cross-source sparingly: Enable includeMemories/includeConversations only when you need context from those sources
  6. Check score distribution: If all scores are low, try rephrasing the query

On this page