承接 piedweb/crawler 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

piedweb/crawler

最新稳定版本:v0.1.899

Composer 安装命令:

composer require piedweb/crawler

包简介

Web Crawler to check few SEO basics.

README 文档

README

Open Source Package

CLI Seo Pocket Crawler

Latest Version Software License GitHub Tests Action Status Quality Score Code Coverage Type Coverage Total Downloads

Web Crawler to check few SEO basics.

Use the collected data in your favorite spreadsheet software or retrieve them via your favorite language.

French documentation available : https://piedweb.com/seo/crawler

Install

Via Packagist

$ composer create-project piedweb/crawler

Usage

Crawler CLI

$ bin/console crawler:go $start

Arguments:

  start                            Define where the crawl start. Eg: https://piedweb.com
                                   You can specify an id from a previous crawl. Other options will not be listen.
                                   You can use `last` to continue the last crawl (just stopped)

Options:

  -l, --limit=LIMIT                Define where a depth limit [default: 5]
  -i, --ignore=IGNORE              Virtual Robots.txt to respect (could be a string or an URL).
  -u, --user-agent=USER-AGENT      Define the user-agent used during the crawl. [default: "SEO Pocket Crawler - PiedWeb.com/seo/crawler"]
  -w, --wait=WAIT                  In Microseconds, the time to wait between 2 requests. Default 0,1s. [default: 100000]
  -c, --cache-method=CACHE-METHOD  In Microseconds, the time to wait between two request. Default : 100000 (0,1s). [default: 2]
  -r, --restart=RESTART            Permit to restart a previous crawl. Values 1 = fresh restart, 2 = restart from cache
  -h, --help                       Display this help message
  -q, --quiet                      Do not output any message
  -V, --version                    Display this application version
      --ansi                       Force ANSI output
      --no-ansi                    Disable ANSI output
  -n, --no-interaction             Do not ask any interactive question
  -v|vv|vvv, --verbose             Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug



Extract All External Links in 1s from a previous crawl

$ bin/console crawler:external $id [--host]
    --id
        id from a previous crawl
        You can use  `last` too show external links from the last crawl.

    --host -ho
        flag permitting to get only host

Calcul Page Rank

Will update the previous data.csv generated. Then you can explore your website with the PoC pagerank.html (in a server npx http-server -c-1 --port 3000).

$ bin/console crawler:pagerank $id
    --id
        id from a previous crawl
        You can use `last` too calcul page rank from the last crawl.

Testing

$ composer test

Todo

  • Better Links Harvesting and Recording (record context (list, nav, sentence...))
  • Transform the PoC (Page Rank Visualizer)
  • Complex Page Rank Calculator (with 301, canonical, nofollow, etc.)

Contributing

Please see contributing

Credits

License

The MIT License (MIT). Please see License File for more information.

Latest Version Software License Build Status Quality Score Code Coverage Total Downloads

统计信息

  • 总下载量: 219
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2022-12-14

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固