hindmost/rolling-curl-mini 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

hindmost/rolling-curl-mini

Composer 安装命令:

composer require hindmost/rolling-curl-mini

包简介

README 文档

README

Rolling Curl Mini is a fork of Rolling Curl. It allows to process multiple HTTP requests in parallel using cURL PHP library.

For more information read this article (in russian).

Basic Usage Sample

...
require "RollingCurlMini.php";
...
$o_mc = new RollingCurlMini(10);
...
$o_mc->add($url, $postdata, $callback, $userdata, $options, $headers);
...
$o_mc->execute();
...

Callbacks

Any request may have an individual callback - function/method to be called as this request is completed. Callback accepts 4 parameters and may look like the following one:

/**
 * @param string $content - content of request response
 * @param string $url - URL of requested resource
 * @param array $info - cURL handle info
 * @param mixed $userdata - user-defined data passed with add() method
 */
function request_callback($content, $url, $info, $userdata) {
}

License

Rolling Scraper Abstract

Rolling Scraper Abstract is a multipurpose scraping (crawling) framework which uses facilities of multi-curl and RollingCurlMini class. It is a base PHP class which implement common functionality of a multi-curl scraper. Particular functionality should be implemented in derived classes. Particular scraper class should extend RollingScraperAbstract class and implement (override) two mandatory methods: _initPages and _handlePage.

For more information read this article (in russian).

Example

Basic Usage Sample

class MyScraper extends RollingScraperAbstract
{
    ...
    public function __construct() {
        ...
        $this->modConfig(array(
            'state_time_storage' => '...', // temporal section of state storage (file path)
            'state_data_storage' => '...', // data section of state storage (file path)
            'scrape_life' => 0, // expiration time (secs) of scraped data
            'run_timeout' => 30, // max. time (secs) to execute scraper script
            'run_pages_loops' => 20, // max. number of loops through pages
            'run_pages_buffer' => 500, // page requests buffer size
            'curl_threads' => 10, // number of multi-curl threads
            'curl_options' => array(...), // CURL options used in multi-curl requests
        ));
        parent::__construct();
    }

    /**
     * Initialize the starting list of page requests
     */
    protected function _initPages() {
        ...
        // add page request. $url - page URL
        $this->addPage($url);
        ...
    }

    /**
     * Process response of a page request
     * @param string $cont - page content
     * @param string $url - url of request
     * @param array $aInfo - CURL info data
     * @param int $index - # of page request
     * @param array $aData - custom request data (part of request data)
     * @return bool
     */
    protected function _handlePage($cont, $url, $aInfo, $index, $aData) {
        ...
    }
    ...
}

$scraper = new MyScraper();
$bool = $scraper->run();
list($time_start, $time_end, , $time_run_start, , $n_pages_total, $n_pages_passed) =
    $scraper->getStateProgress();
if ($time_end) {
    echo sprintf('Completed at %s', date('Y.m.d, H:i:s', $time_end));
}
else {
    if ($bool)
        echo sprintf('In progress: %d/%d pages', $n_pages_passed, $n_pages_total);
    else
        echo 'Cancelled since another script instance is still running';
}

统计信息

  • 总下载量: 28
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 16
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 16
  • Watchers: 6
  • Forks: 7
  • 开发语言: PHP

其他信息

  • 授权协议: Unknown
  • 更新时间: 2018-05-31

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固