README

🕷️ Spider - PHP Web Crawler & HTML Parser

A lightweight and powerful PHP web crawler inspired by jQuery-style DOM manipulation.

Fetch web pages, parse HTML documents, search elements with CSS selectors, manipulate the DOM, and export modified pages with an elegant and simple API.

✨ Features

🌐 Load and parse any HTML web page
🔍 CSS selector-based element searching
📄 Extract text, HTML, and attributes
🔁 Iterate over multiple DOM elements
🧹 Remove and clean HTML elements
🏗️ Modify the DOM structure dynamically
🎨 Manage CSS classes and IDs
💾 Export modified HTML documents
⚡ Lightweight and dependency-free PHP implementation

📥 Installation

Install with Composer

composer require arefshojaei/spider

Clone from GitHub

git clone https://github.com/ArefShojaei/Spider.git
cd Spider

🚀 Quick Start

Fetch a page and extract its content:

<?php

use Spider\Spider;

$spider = new Spider();

$page = $spider->loadHTML("https://google.com");

echo $page->find("title")->text() . PHP_EOL;

$page->findAll("a")->each(function ($key, $link) {
    echo "[LINK] " . $link->attr("href") . PHP_EOL;
});

🔎 Finding Elements

Search DOM elements using CSS selectors.

Find a single element

$page->find("a");
$page->find(".product");
$page->find("#header");

Find multiple elements

$page->findAll("a");
$page->findAll(".product");

🔁 Iterating Elements

Perform operations on element collections.

each()

Loop through every element:

$page->findAll("a")->each(function ($key, $anchor) {
    echo $anchor->text();
});

map()

Transform elements:

$anchors = $page->findAll("a")->map(function ($key, $anchor) {
    $anchor->attr("data-id", rand());

    return $anchor;
});

filter()

Filter elements by a condition:

$links = $page->findAll("a")->filter(
    fn($key, $anchor) => $anchor->attr("href")
);

🌳 DOM Traversing

Navigate and modify element relationships.

Parent element

$parent = $page->find(".product")->parent();

Insert sibling elements

$page->find(".product")
     ->before("<p>Before Element</p>");

$page->find(".product")
     ->after("<p>After Element</p>");

Insert child elements

$page->find(".product")
     ->append("<p>New Child</p>");

$page->find(".product")
     ->prepend("<p>First Child</p>");

🧹 Cleaning Elements

Remove content or complete elements.

Empty content

$page->find("p")->empty();

Remove element

$page->find("p")->remove();

📄 Working with Content

Get text or HTML

$text = $page->find("p")->text();

$html = $page->find("p")->html();

Update content

$page->find("p")->text("New text");

$page->find("p")->html("<strong>New HTML</strong>");

🏷️ Working with Attributes

Read attributes

$attributes = $page->find("a")->attr();

$link = $page->find("a")->attr("href");

Set attributes

$page->find("a")->attr("data-id", 123);

🎨 CSS Classes & IDs

Classes

$page->find("p")->addClass("active");

$page->find("p")->removeClass("active");

$page->find("p")->hasClass("active");

IDs

$page->find("p")->addID("article");

$page->find("p")->removeID("article");

$page->find("p")->hasID("article");

💾 Export HTML

Save the current DOM document to a file.

$filename = "page";

$path = __DIR__ . "/html/" . $filename . rand() . ".html";

$page->export($path);

💡 Example Use Cases

Spider can be used for:

Web scraping and data extraction
SEO analysis
Content migration
HTML cleaning and transformation
Static website processing
Automated testing of HTML pages
Learning how browser DOM engines work

🔥 Why Spider?

Spider brings the simplicity of jQuery-style DOM APIs into PHP.

Instead of dealing with complex DOMDocument operations, you can navigate and manipulate HTML documents using a clean and expressive syntax.

It is a great educational project for learning:

Web crawling concepts
HTML parsing
DOM tree manipulation
CSS selector engines
Collection processing
Parser design

🤝 Contributing

Contributions are welcome.

Fork the repository
Create a feature branch:

git checkout -b feature/amazing-feature

Commit your changes:

git commit -m "Add amazing feature"

Push your branch:

git push origin feature/amazing-feature

Open a Pull Request.

👨‍💻 Author

Aref Shojaei

📧 Email: arefshojaei82@gmail.com
🐙 GitHub: @ArefShojaei
📦 Packagist: arefshojaei/spider

⭐ Show Your Support

If this project helps you understand web crawling, HTML parsing, and DOM manipulation, consider giving it a Star ⭐ on GitHub.

Your support motivates future improvements.

arefshojaei/spider

包简介

关键字：

README 文档