承接 pforret/pf-article-extractor 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

pforret/pf-article-extractor

Composer 安装命令:

composer require pforret/pf-article-extractor

包简介

PfArticleExtractor. Boilerplate Removal and Fulltext Extraction from HTML pages

README 文档

README

Tests GitHub Release GitHub Tag GitHub commit activity Packagist Downloads PHP GitHub License

Boilerplate Removal and Fulltext Extraction from HTML pages.

Rewrite of dotpack/php-boiler-pipe for PHP8.2 and up, with tests.

Installation

composer require pforret/pf-article-extractor

Usage

use Pforret\PfArticleExtractor\ArticleExtractor;

$articleData = ArticleExtractor::getArticle($html);
/*
 * $articleData = Pforret\PfArticleExtractor\Formats\ArticleContentsDTO Object
(
    [title] => Film Podcast: Wicked Little Letters Named Film of the Month
    [content] => UK Film Club was back in March with a new episode of their film podcast. (...)
    [date] =>
    [images] => Array
        (
            [0] => https://static.wixstatic.com/media/.../b19cd0_dde0d59546f84127865267f43994f39b~mv2.jpg
        )

    [links] => Array
        (
            [0] => https://www.chrisolson.co.uk/
            (...)
        )

)

 */

Under the hood

  • package accepts a full HTML page as input
  • it will walk the DOM tree and try to find the main article content
  • it will remove boilerplate content (like headers, footers, sidebars, ...)
  • it will try to extract the main article content
  • it will try to extract the title, date, images and links from the article

Rights now it's tested with example pages for

  • Blogger
  • Drupal
  • Jekyll
  • Mkdocs
  • Wix
  • WordPress

Similar packages

统计信息

  • 总下载量: 1.71k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 4
  • 点击次数: 5
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 4
  • Watchers: 0
  • Forks: 12
  • 开发语言: HTML

其他信息

  • 授权协议: MIT
  • 更新时间: 2024-06-02

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固