承接 ahmaadkhader/pdf-to-html 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

ahmaadkhader/pdf-to-html

最新稳定版本:1.0.0

Composer 安装命令:

composer require ahmaadkhader/pdf-to-html

包简介

Standalone PHP library for extracting semantic HTML from PDF files. Detects headings, lists, tables, links, and inline styles from PDF content.

README 文档

README

Standalone PHP library for extracting semantic HTML from PDF files using smalot/pdfparser.

Features

  • Heading detection — identifies heading levels from font size ratios
  • List detection — groups bullet and numbered items into <ul>/<ol> lists
  • Table detection — identifies tabular content from X-coordinate column clustering
  • Link extraction — matches PDF link annotations to text content
  • Inline styles — preserves font size and color differences as CSS

Installation

composer require ahmaadkhader/pdf-to-html

Usage

use Ahmaadkhader\PdfToHtml\PdfToHtml;
use Ahmaadkhader\PdfToHtml\StyleAnalyzer;
use Ahmaadkhader\PdfToHtml\TableDetector;
use Ahmaadkhader\PdfToHtml\LinkExtractor;
use Ahmaadkhader\PdfToHtml\HtmlRenderer;

$styleAnalyzer = new StyleAnalyzer();
$tableDetector = new TableDetector();
$linkExtractor = new LinkExtractor($styleAnalyzer);
$htmlRenderer = new HtmlRenderer($styleAnalyzer);

$converter = new PdfToHtml($styleAnalyzer, $tableDetector, $linkExtractor, $htmlRenderer);

// Extract plain text.
$text = $converter->extractText('/path/to/file.pdf');

// Extract semantic HTML with headings, lists, tables and links.
$html = $converter->extractHtml('/path/to/file.pdf');

// Use native heading tags (h1-h6) instead of class-based.
$html = $converter->extractHtml('/path/to/file.pdf', ['native_headings' => true]);

Architecture

Class Responsibility
PdfToHtml Core orchestration — parses PDF, coordinates sub-components
StyleAnalyzer Font size, color, heading level, and inline style detection
TableDetector Table region detection from DataTm positioning data
LinkExtractor PDF link annotation extraction and text matching
HtmlRenderer Renders classified content lines into semantic HTML

Requirements

  • PHP 8.1+
  • smalot/pdfparser ^2.12

License

GPL-2.0-or-later

统计信息

  • 总下载量: 2
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 3
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: GPL-2.0-or-later
  • 更新时间: 2026-05-12

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固