deepslam/content-parser 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

deepslam/content-parser

Composer 安装命令:

composer require deepslam/content-parser

包简介

Simple content grabber which can detecting content on various web pages

README 文档

README

With this package, you can easily detect main content on different web pages and grab the content from it. This package provides following features:

  • Expandable architecture. You can easily add support for new APIs
  • Code cleaning. The package can automatically clean CSS and style attributes. Thus you always will receive clean and good HTML content.

The package uses automatic algorithms for grabbing data from web pages. You'll receive the title and the content from needle web page.

Requirements

The package requires follow solutions:

Installation

You can install the package via Composer. Just run:

php composer require deepslam/content-parser

Further, you have to add service provider in your config/app.php:

...
Deepslam\ContentParser\ContentParserServiceProvider::class,
...

At next step you need to create alias in your config/app.php:

'ContentParser' => Deepslam\ContentParser\ContentParser::class,

After it you need to publish configs:

php artisan vendor:publish --provider="Deepslam\ContentParser\ContentParserServiceProvider"

Do not forget to run config:cache command:

php artisan config:cache

That's all!

Settings

There are two different parsers:

  • Standalone parser - graby which uses by default.
  • MarcuryContentParser which uses Mercury API

Thus you have 3 configs:

  • /config/deepslam/parser.php - This is the common config for all parsers. Here you can configure such options as necessary of cleaning code, stripping tags, set allow tags list etc.
  • /config/deepslam/mercury-tools.php - There is only one settings - API key for Mercury API service
  • /config/deepslam/graby.php - This is the copy of original settings of graby parser. You can read about it on developer's page.

Usage

You can easily use ContentParser:

$parser = ContentParser::create();

There will be ContentParser object created.

This configuration will use "Graby" parser. If you need to use another one, you can specify it as a parameter:

$parser = ContentParser::create('mercury');

As result, you will receive ContentParser object.

For parse data, you must use parse method which return true\false result (true if data has been received, false if not)

$parser->parse($url)

For getting a result of parsing there is one method:

  • getResult - Returns needle ParsingResult object

There are a few methods in this object:

  • setTitle - Set new title
  • setContent - Set new content
  • setImage - Set main image for the content
  • setOriginal - Save original response
  • getTitle - The title of result
  • getContent - The content of result. It can be already cleaned if you specify it in configs.
  • getImage - Returns URL to the OG Image or empty string
  • getOriginal - Just returns service\script original response
  • isEmpty - Is it empty object (without data) or not?
  • stripContent - Manually strip content from tags
  • cleanContent - Manually clean content from strange classes, ID's and style blocks in the parsed HTML

Extending

If you want to add a new parser you must create a new class and inherit it from \Deepslam\ContentParser\ContentParser class. You must realise the only one method - parse which must return bool as result and changes internal result object.

After it, you must specific your new class in the /config/deepslam/parser.php parsers array.

To use you parser specify it when you call ContentParser as shows below:

$parser = ContentParser::create('<your alias of parser>');

Full example

        $parser = ContentParser::create('<parser which you need>');
        $parser->parse('<url to grab>');
        $result = $parser->getResult();
        <your_model>->name = $result->getTitle();
        <your_model>->description = $result->getContent();

Support

If you find bug or have question\suggestion you can send e-mail to me: [me@ivanovdmitry.com]me@ivanovdmitry.com

统计信息

  • 总下载量: 22
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 3
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 3
  • Watchers: 0
  • Forks: 1
  • 开发语言: PHP

其他信息

  • 授权协议: GPL-3.0
  • 更新时间: 2017-05-19

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固