HTML Conversion

Convert HTML files and saved web pages to clean, searchable Markdown.

Usage

pnpm convert -- html <path> [options]

Options

Option	Default	Description
`-o, --output <dir>`	`./converted/web`	Output directory
`-r, --recursive`	`false`	Process subdirectories
`-v, --verbose`	`false`	Enable verbose logging
`--dry-run`	`false`	Preview without writing
`-t, --tags <tags...>`	`["web"]`	Additional tags

Example

# Single file
pnpm convert -- html saved-page.html
 
# Directory
pnpm convert -- html ./saved-pages/ -r
 
# With custom output
pnpm convert -- html ./articles/ -o ./knowledge/articles -r

Output Format

---
title: "Understanding Hybrid Search"
source_type: web
source_hash: "def456..."
tags:
  - web
  - imported
created_at: "2024-03-20T14:22:00Z"
converted_at: "2024-03-20T14:22:00Z"
metadata:
  url: "https://example.com/article"
  author: "Jane Smith"
---
 
# Understanding Hybrid Search
 
Article content in clean Markdown...

What Gets Extracted

Main content (article body)
Title and metadata
Images (referenced, not embedded)
Links (preserved as Markdown)

What Gets Removed

Navigation menus
Advertisements
Scripts and styles
Cookie banners
Footer boilerplate

Supported Formats

.html / .htm files
Saved web pages
Browser "Save As" exports
Google Takeout saved pages

Next Steps

Batch Upload - Upload converted files

On this page