定制 lalbert/daric 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

lalbert/daric

Composer 安装命令:

composer require lalbert/daric

包简介

Simple and configurable PHP web spider and web scraper

README 文档

README

Daric is a Simple and configurable PHP web spider and web scraper written under the Goutte library.

Installation

The best way to install Daric it use composer

composer require lalbert/daric

Usage

There are two components : Scraper and Spider.

Scraper

Scaper is used to extract, clean, and format web page data.

It uses extractors, cleaners and formatters to achieve its goals.

use Daric\Scraper;
use Daric\Extractor\CrawlerExtractorFactory;

$scraper = new Scrapper('http://website.tld');
$scraper->setExtractors([
  'meta_title' => CrawlerExtractorFactory::create('title@_text'), // get text node of <title></title>
  'meta_description' => CrawlerExtractorFactory::create('meta[name="description"]@content'), // get attribute "content" of <meta name="description" />
  'list' => CrawlerExtractorFactory::create('#content ul.list li@_text("array")') // get all text node of li item. Return an array
]);

$doc = $scraper->scrape(); // return Daric\Document

echo $doc->getData('meta_title');
print_r($doc['list']);

Spider

Spider is used to crawl a website to scrape some web page data.

use Daric\Spider;
use Daric\Scraper;
use Daric\Extractor\CrawlerExtractorFactory;

$spider = new Spider('http://website.tld');

$spider->setLinkExtractor(CrawlerExtractorFactory::create('#content article a.link@href("array")'));
$spider->setNextLinkExtractor(CrawlerExtractorFactory::create('#nav a.next@href'));

foreach ($spider as $pageUri) {
  $scraper = new Scraper($pageUri, $extractors, $cleaners, $formatters);
  $doc = $scraper->scrape();
  ...
}

Licence

Daric is licensed under the MIT License - see the LICENSE file for details

统计信息

  • 总下载量: 29
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2016-05-18

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固