xberg-io/html-to-markdown
Composer 安装命令:
pie install xberg-io/html-to-markdown
包简介
High-performance HTML to Markdown converter
关键字:
README 文档
README
html-to-markdown
Fast, robust HTML → Markdown for 16 languages. A tiered converter that picks the safest, fastest path per input without losing content.
What and Why?
html-to-markdown converts real-world HTML — unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — into clean CommonMark (or Djot) without losing content, from one Rust core with native bindings for 16 languages.
It routes each input through three tiers: a single-pass byte scanner for clean HTML, a tolerant DOM walker for complex inputs, and an html5ever repair pass for malformed HTML — with byte-identical output across tiers, enforced by a 116-snapshot oracle and per-group performance gates in CI. The dispatcher is invisible: the same convert() call works regardless of which tier runs.
Features
| Feature | Description |
|---|---|
| 16 languages, one Rust core | Rust, Python, Node.js, WASM, Java, Go, C#, PHP, Ruby, Elixir, R, Dart, Kotlin (Android), Swift, Zig, and a C ABI |
| Tiered dispatch | Byte scanner → DOM walker → html5ever repair, with byte-equal output across tiers |
| Real-HTML robust | Unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — handled without losing content |
| GFM tables | Padded cells, alignment, and pipe escaping |
| Djot output | Set output_format = "djot" to emit Djot instead of Markdown |
| Metadata extraction | Parse <head> into structured metadata (Open Graph, Twitter, JSON-LD, microdata, RDFa, header hierarchy) |
| Inline images | Opt-in mirroring of data URIs and remote image references |
| Visitor API | Feature-gated traversal to transform the converted Markdown AST |
| Configurable preprocessing | Standard, strict, and lenient presets — or build your own |
| Fast | 19–116 MB/s on the Wikipedia/mdream corpus; per-group regression thresholds enforced on every PR |
⭐ Star this repo to show your support — it helps others discover html-to-markdown.
Quick Start
convert() is the single entry point — it returns a structured result with content, warnings, and optional metadata.
Language Packages
Java
Available on Maven Central as io.xberg:html-to-markdown. See Java README for the dependency snippet and current version.
PHP
This is a native PHP extension (Rust ext-php-rs), so install it with PIE — not composer require:
pie install xberg-io/html-to-markdown
See PHP README for full documentation.
Elixir
Add {:html_to_markdown, "~> 3.6"} to your mix.exs dependencies. See Elixir README for full documentation.
R
install.packages("htmltomarkdown", repos = "https://xberg-io.r-universe.dev")
See R README for full documentation.
Kotlin (Android)
Available on Maven Central as io.xberg:html-to-markdown-android. See Kotlin README for the dependency snippet and current version.
Swift
Add via Swift Package Manager. See Swift README for full documentation.
Zig
See Zig README for installation and usage.
WebAssembly
npm install @xberg-io/html-to-markdown-wasm
See WebAssembly README for full documentation.
C/C++ (FFI)
Pre-built .so / .dll / .dylib from GitHub Releases. See FFI crate for full documentation.
CLI
cargo install html-to-markdown-cli
brew install xberg-io/tap/html-to-markdown
See CLI usage for full documentation.
AI Coding Assistants
Install the html-to-markdown plugin from the xberg-io/plugins marketplace. It ships the html-to-markdown agent skills and works with every major coding agent — expand your harness below.
Claude Code
/plugin marketplace add xberg-io/plugins
/plugin install html-to-markdown@xberg-io
Codex CLI
/plugins add https://github.com/xberg-io/plugins
Then search for html-to-markdown and select Install Plugin.
Cursor
Settings → Plugins → Add from URL → https://github.com/xberg-io/plugins, then select html-to-markdown.
Gemini CLI
gemini extensions install https://github.com/xberg-io/plugins
Factory Droid
droid plugin marketplace add https://github.com/xberg-io/plugins
droid plugin install html-to-markdown@xberg-io
GitHub Copilot CLI
copilot plugin marketplace add https://github.com/xberg-io/plugins
copilot plugin install html-to-markdown@xberg-io
opencode
Add the package to opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"plugin": ["@xberg-io/opencode-html-to-markdown"]
}
Documentation
Full guides, the convert() API for every binding, tier architecture, the metadata and visitor APIs, and performance benchmarks live at docs.html-to-markdown.xberg.io.
Part of Xberg
- Xberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
- Xberg Enterprise — managed extraction API with SDKs, dashboards, and observability.
- crawlberg — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- html-to-markdown — fast, lossless HTML→Markdown engine.
- liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
- tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
- alef — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
Contributing
Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines.
License
MIT License — see LICENSE for details.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-07-02