定制 arefshojaei/spider 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

arefshojaei/spider

Composer 安装命令:

composer require arefshojaei/spider

包简介

PHP web crawler

README 文档

README

logo

🕷️ Spider - PHP Web Crawler & HTML Parser

A lightweight and powerful PHP web crawler inspired by jQuery-style DOM manipulation.

Fetch web pages, parse HTML documents, search elements with CSS selectors, manipulate the DOM, and export modified pages with an elegant and simple API.

✨ Features

  • 🌐 Load and parse any HTML web page
  • 🔍 CSS selector-based element searching
  • 📄 Extract text, HTML, and attributes
  • 🔁 Iterate over multiple DOM elements
  • 🧹 Remove and clean HTML elements
  • 🏗️ Modify the DOM structure dynamically
  • 🎨 Manage CSS classes and IDs
  • 💾 Export modified HTML documents
  • ⚡ Lightweight and dependency-free PHP implementation

📥 Installation

Install with Composer

composer require arefshojaei/spider

Clone from GitHub

git clone https://github.com/ArefShojaei/Spider.git
cd Spider

🚀 Quick Start

Fetch a page and extract its content:

<?php

use Spider\Spider;

$spider = new Spider();

$page = $spider->loadHTML("https://google.com");

echo $page->find("title")->text() . PHP_EOL;

$page->findAll("a")->each(function ($key, $link) {
    echo "[LINK] " . $link->attr("href") . PHP_EOL;
});

🔎 Finding Elements

Search DOM elements using CSS selectors.

Find a single element

$page->find("a");
$page->find(".product");
$page->find("#header");

Find multiple elements

$page->findAll("a");
$page->findAll(".product");

🔁 Iterating Elements

Perform operations on element collections.

each()

Loop through every element:

$page->findAll("a")->each(function ($key, $anchor) {
    echo $anchor->text();
});

map()

Transform elements:

$anchors = $page->findAll("a")->map(function ($key, $anchor) {
    $anchor->attr("data-id", rand());

    return $anchor;
});

filter()

Filter elements by a condition:

$links = $page->findAll("a")->filter(
    fn($key, $anchor) => $anchor->attr("href")
);

🌳 DOM Traversing

Navigate and modify element relationships.

Parent element

$parent = $page->find(".product")->parent();

Insert sibling elements

$page->find(".product")
     ->before("<p>Before Element</p>");

$page->find(".product")
     ->after("<p>After Element</p>");

Insert child elements

$page->find(".product")
     ->append("<p>New Child</p>");

$page->find(".product")
     ->prepend("<p>First Child</p>");

🧹 Cleaning Elements

Remove content or complete elements.

Empty content

$page->find("p")->empty();

Remove element

$page->find("p")->remove();

📄 Working with Content

Get text or HTML

$text = $page->find("p")->text();

$html = $page->find("p")->html();

Update content

$page->find("p")->text("New text");

$page->find("p")->html("<strong>New HTML</strong>");

🏷️ Working with Attributes

Read attributes

$attributes = $page->find("a")->attr();

$link = $page->find("a")->attr("href");

Set attributes

$page->find("a")->attr("data-id", 123);

🎨 CSS Classes & IDs

Classes

$page->find("p")->addClass("active");

$page->find("p")->removeClass("active");

$page->find("p")->hasClass("active");

IDs

$page->find("p")->addID("article");

$page->find("p")->removeID("article");

$page->find("p")->hasID("article");

💾 Export HTML

Save the current DOM document to a file.

$filename = "page";

$path = __DIR__ . "/html/" . $filename . rand() . ".html";

$page->export($path);

💡 Example Use Cases

Spider can be used for:

  • Web scraping and data extraction
  • SEO analysis
  • Content migration
  • HTML cleaning and transformation
  • Static website processing
  • Automated testing of HTML pages
  • Learning how browser DOM engines work

🔥 Why Spider?

Spider brings the simplicity of jQuery-style DOM APIs into PHP.

Instead of dealing with complex DOMDocument operations, you can navigate and manipulate HTML documents using a clean and expressive syntax.

It is a great educational project for learning:

  • Web crawling concepts
  • HTML parsing
  • DOM tree manipulation
  • CSS selector engines
  • Collection processing
  • Parser design

🤝 Contributing

Contributions are welcome.

  1. Fork the repository

  2. Create a feature branch:

git checkout -b feature/amazing-feature
  1. Commit your changes:
git commit -m "Add amazing feature"
  1. Push your branch:
git push origin feature/amazing-feature
  1. Open a Pull Request.

👨‍💻 Author

Aref Shojaei

⭐ Show Your Support

If this project helps you understand web crawling, HTML parsing, and DOM manipulation, consider giving it a Star ⭐ on GitHub.

Your support motivates future improvements.

统计信息

  • 总下载量: 9
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 6
  • 依赖项目数: 1
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2025-03-28

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固