arefshojaei/spider
Composer 安装命令:
composer require arefshojaei/spider
包简介
PHP web crawler
README 文档
README
🕷️ Spider - PHP Web Crawler & HTML Parser
A lightweight and powerful PHP web crawler inspired by jQuery-style DOM manipulation.
Fetch web pages, parse HTML documents, search elements with CSS selectors, manipulate the DOM, and export modified pages with an elegant and simple API.
✨ Features
- 🌐 Load and parse any HTML web page
- 🔍 CSS selector-based element searching
- 📄 Extract text, HTML, and attributes
- 🔁 Iterate over multiple DOM elements
- 🧹 Remove and clean HTML elements
- 🏗️ Modify the DOM structure dynamically
- 🎨 Manage CSS classes and IDs
- 💾 Export modified HTML documents
- ⚡ Lightweight and dependency-free PHP implementation
📥 Installation
Install with Composer
composer require arefshojaei/spider
Clone from GitHub
git clone https://github.com/ArefShojaei/Spider.git
cd Spider
🚀 Quick Start
Fetch a page and extract its content:
<?php use Spider\Spider; $spider = new Spider(); $page = $spider->loadHTML("https://google.com"); echo $page->find("title")->text() . PHP_EOL; $page->findAll("a")->each(function ($key, $link) { echo "[LINK] " . $link->attr("href") . PHP_EOL; });
🔎 Finding Elements
Search DOM elements using CSS selectors.
Find a single element
$page->find("a"); $page->find(".product"); $page->find("#header");
Find multiple elements
$page->findAll("a"); $page->findAll(".product");
🔁 Iterating Elements
Perform operations on element collections.
each()
Loop through every element:
$page->findAll("a")->each(function ($key, $anchor) { echo $anchor->text(); });
map()
Transform elements:
$anchors = $page->findAll("a")->map(function ($key, $anchor) { $anchor->attr("data-id", rand()); return $anchor; });
filter()
Filter elements by a condition:
$links = $page->findAll("a")->filter( fn($key, $anchor) => $anchor->attr("href") );
🌳 DOM Traversing
Navigate and modify element relationships.
Parent element
$parent = $page->find(".product")->parent();
Insert sibling elements
$page->find(".product") ->before("<p>Before Element</p>"); $page->find(".product") ->after("<p>After Element</p>");
Insert child elements
$page->find(".product") ->append("<p>New Child</p>"); $page->find(".product") ->prepend("<p>First Child</p>");
🧹 Cleaning Elements
Remove content or complete elements.
Empty content
$page->find("p")->empty();
Remove element
$page->find("p")->remove();
📄 Working with Content
Get text or HTML
$text = $page->find("p")->text(); $html = $page->find("p")->html();
Update content
$page->find("p")->text("New text"); $page->find("p")->html("<strong>New HTML</strong>");
🏷️ Working with Attributes
Read attributes
$attributes = $page->find("a")->attr(); $link = $page->find("a")->attr("href");
Set attributes
$page->find("a")->attr("data-id", 123);
🎨 CSS Classes & IDs
Classes
$page->find("p")->addClass("active"); $page->find("p")->removeClass("active"); $page->find("p")->hasClass("active");
IDs
$page->find("p")->addID("article"); $page->find("p")->removeID("article"); $page->find("p")->hasID("article");
💾 Export HTML
Save the current DOM document to a file.
$filename = "page"; $path = __DIR__ . "/html/" . $filename . rand() . ".html"; $page->export($path);
💡 Example Use Cases
Spider can be used for:
- Web scraping and data extraction
- SEO analysis
- Content migration
- HTML cleaning and transformation
- Static website processing
- Automated testing of HTML pages
- Learning how browser DOM engines work
🔥 Why Spider?
Spider brings the simplicity of jQuery-style DOM APIs into PHP.
Instead of dealing with complex DOMDocument operations, you can navigate and manipulate HTML documents using a clean and expressive syntax.
It is a great educational project for learning:
- Web crawling concepts
- HTML parsing
- DOM tree manipulation
- CSS selector engines
- Collection processing
- Parser design
🤝 Contributing
Contributions are welcome.
-
Fork the repository
-
Create a feature branch:
git checkout -b feature/amazing-feature
- Commit your changes:
git commit -m "Add amazing feature"
- Push your branch:
git push origin feature/amazing-feature
- Open a Pull Request.
👨💻 Author
Aref Shojaei
- 📧 Email: arefshojaei82@gmail.com
- 🐙 GitHub: @ArefShojaei
- 📦 Packagist: arefshojaei/spider
⭐ Show Your Support
If this project helps you understand web crawling, HTML parsing, and DOM manipulation, consider giving it a Star ⭐ on GitHub.
Your support motivates future improvements.
统计信息
- 总下载量: 9
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 6
- 依赖项目数: 1
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2025-03-28