textrawl
byJeff Green

search_knowledge

Hybrid semantic + full-text search tool

Hybrid search combining semantic similarity with full-text keyword matching using Reciprocal Rank Fusion (RRF).

Parameters

ParameterTypeRequiredDefaultDescription
querystringYes-Natural language search query (1-10000 chars)
limitnumberNo10Maximum results to return (1-50)
fullTextWeightnumberNo1.0Weight for keyword matching (0-2)
semanticWeightnumberNo1.0Weight for semantic similarity (0-2)
tagsstring[]No-Filter to documents with ALL specified tags
sourceTypeenumNo-Filter by source: note, file, or url
minScorenumberNo-Minimum relevance score threshold (0-1)

Example Request

{
  "query": "quarterly planning meeting notes",
  "limit": 5,
  "fullTextWeight": 1.0,
  "semanticWeight": 1.5,
  "tags": ["work", "planning"],
  "sourceType": "note",
  "minScore": 0.5
}

Response

{
  "query": "quarterly planning meeting notes",
  "filters": {
    "tags": ["work", "planning"],
    "sourceType": "note",
    "minScore": 0.5
  },
  "totalResults": 3,
  "results": [
    {
      "documentId": "550e8400-e29b-41d4-a716-446655440000",
      "documentTitle": "Q4 Planning Notes",
      "sourceType": "note",
      "tags": ["work", "planning", "q4"],
      "chunkId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
      "content": "In the Q4 planning meeting, we discussed the roadmap for...",
      "score": 0.89
    },
    {
      "documentId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
      "documentTitle": "Annual Planning Overview",
      "sourceType": "note",
      "tags": ["work", "planning"],
      "chunkId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
      "content": "Quarterly milestones should align with company objectives...",
      "score": 0.76
    }
  ]
}

How It Works

Reciprocal Rank Fusion (RRF)

textrawl runs two parallel searches:

  1. Full-Text Search: PostgreSQL tsvector matching for exact keywords
  2. Semantic Search: pgvector cosine similarity for meaning

Results are combined using RRF:

RRF_score = (fullTextWeight / (k + fts_rank)) + (semanticWeight / (k + semantic_rank))

Where k = 60 (standard RRF constant).

Weight Tuning

Use CasefullTextWeightsemanticWeight
Exact phrases1.5-2.00.5-1.0
Conceptual search0.5-1.01.5-2.0
Balanced (default)1.01.0
Keyword-only2.00
Semantic-only02.0

Tip: Start with default weights and adjust based on results. Higher semantic weight helps with paraphrased queries.

Filtering

By Tags

Filter to documents containing ALL specified tags:

{
  "query": "project update",
  "tags": ["work", "project-x"]
}

Only returns documents tagged with both work AND project-x.

By Source Type

Filter by how the document was created:

sourceTypeDescription
noteCreated via add_note tool
fileUploaded via CLI or Web UI
urlWeb content (future feature)

By Score Threshold

Filter out low-relevance results:

{
  "query": "specific topic",
  "minScore": 0.7
}

Caution: Scores are relative. A 0.7 threshold may filter out relevant results for broad queries.

Error Responses

Database not configured

{
  "error": "Database not configured",
  "message": "Set SUPABASE_URL and SUPABASE_SERVICE_KEY to enable search."
}

OpenAI not configured

{
  "error": "OpenAI not configured",
  "message": "Set OPENAI_API_KEY to enable semantic search."
}

Search failed

{
  "error": "Search failed",
  "message": "Connection timeout to database"
}

Best Practices

  1. Start broad, then narrow: Begin without filters, add them if too many results
  2. Use semantic weight for questions: Increase semanticWeight for natural language queries
  3. Use full-text weight for keywords: Increase fullTextWeight for specific terms
  4. Combine with get_document: Search returns chunks, use get_document for full context
  5. Check score distribution: If all scores are low, try rephrasing the query

On this page