ahmaadkhader/pdf-to-html
最新稳定版本:1.0.0
Composer 安装命令:
composer require ahmaadkhader/pdf-to-html
包简介
Standalone PHP library for extracting semantic HTML from PDF files. Detects headings, lists, tables, links, and inline styles from PDF content.
README 文档
README
Standalone PHP library for extracting semantic HTML from PDF files using smalot/pdfparser.
Features
- Heading detection — identifies heading levels from font size ratios
- List detection — groups bullet and numbered items into
<ul>/<ol>lists - Table detection — identifies tabular content from X-coordinate column clustering
- Link extraction — matches PDF link annotations to text content
- Inline styles — preserves font size and color differences as CSS
Installation
composer require ahmaadkhader/pdf-to-html
Usage
use Ahmaadkhader\PdfToHtml\PdfToHtml; use Ahmaadkhader\PdfToHtml\StyleAnalyzer; use Ahmaadkhader\PdfToHtml\TableDetector; use Ahmaadkhader\PdfToHtml\LinkExtractor; use Ahmaadkhader\PdfToHtml\HtmlRenderer; $styleAnalyzer = new StyleAnalyzer(); $tableDetector = new TableDetector(); $linkExtractor = new LinkExtractor($styleAnalyzer); $htmlRenderer = new HtmlRenderer($styleAnalyzer); $converter = new PdfToHtml($styleAnalyzer, $tableDetector, $linkExtractor, $htmlRenderer); // Extract plain text. $text = $converter->extractText('/path/to/file.pdf'); // Extract semantic HTML with headings, lists, tables and links. $html = $converter->extractHtml('/path/to/file.pdf'); // Use native heading tags (h1-h6) instead of class-based. $html = $converter->extractHtml('/path/to/file.pdf', ['native_headings' => true]);
Architecture
| Class | Responsibility |
|---|---|
PdfToHtml |
Core orchestration — parses PDF, coordinates sub-components |
StyleAnalyzer |
Font size, color, heading level, and inline style detection |
TableDetector |
Table region detection from DataTm positioning data |
LinkExtractor |
PDF link annotation extraction and text matching |
HtmlRenderer |
Renders classified content lines into semantic HTML |
Requirements
- PHP 8.1+
smalot/pdfparser^2.12
License
GPL-2.0-or-later
统计信息
- 总下载量: 2
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 3
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: GPL-2.0-or-later
- 更新时间: 2026-05-12