承接 remind/crawl4ai-php-client 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

remind/crawl4ai-php-client

最新稳定版本:v1.0.0

Composer 安装命令:

composer create-project remind/crawl4ai-php-client

包简介

README 文档

README

REMIND Symfony Crawl4Ai Experimental

crawl4ai-php-client

Symfony CLI tool to crawl sitemaps and generate JSON/Markdown output via Crawl4AI API endpoints.

⚠️ Experimental Status

This project is experimental and under active development. The API and functionality may change without notice.

Overview

crawl4ai-php-client is a Symfony-based command-line application that exclusively provides sitemap crawling functionality. It reads XML sitemap files, extracts URLs, crawls each URL using the Crawl4AI service, and outputs the results as JSON files with optional compression.

Key Features

  • Sitemap-Based Web Crawling: Extract and crawl all URLs from XML sitemap files
  • JSON Output with Optional Compression: Results are written to JSON files with optional gzip compression
  • Symfony Console Application: Built on Symfony 7.4 framework for robust CLI handling
  • Crawl4AI Integration: Leverages Crawl4AI API for advanced web crawling capabilities
  • Flexible Output Modes:
    • Full crawl results with metadata
    • Markdown-only output (content extraction)
  • Configurable Options:
    • Custom output file naming
    • Locale settings for crawling
    • HTTP request timeouts
    • Markdown-only extraction mode
    • Optional gzip compression
  • Docker Support: Containerized setup with Alpine Linux base

Requirements

  • PHP >= 8.2
  • Symfony 7.4
  • Crawl4AI service running and accessible
  • Composer for dependency management

Installation

Via Composer

composer install

Docker

docker build -t crawl4ai-php-client .
docker run -it crawl4ai-php-client php bin/console crawl4ai:sitemap <sitemap-url>

Usage

Basic Command

php bin/console crawl4ai:sitemap <sitemap-url>

Arguments:

  • sitemapUrl - URL to the XML sitemap file (required)

Options

  • --outputFileNamePrefix=<prefix> - Prefix for output JSON files (default: crawl)
  • --locale=<locale> - Locale for crawling (default: en-EN)
  • --fileCompression - Enable gzip compression for output files
  • --timeout=<seconds> - HTTP request timeout in seconds (default: 300)
  • --markdownOnly - Output only markdown content without metadata

Examples

# Basic crawl
php bin/console crawl4ai:sitemap https://example.com/sitemap.xml

# With compression
php bin/console crawl4ai:sitemap https://example.com/sitemap.xml --fileCompression

# Markdown-only extraction with custom prefix
php bin/console crawl4ai:sitemap https://example.com/sitemap.xml \
  --outputFileNamePrefix=content \
  --markdownOnly

# Custom locale and timeout
php bin/console crawl4ai:sitemap https://example.com/sitemap.xml \
  --locale=de-DE \
  --timeout=600

Output

Results are saved to the crawl/output/ directory with timestamped filenames:

  • Format: {prefix}-{domain}-{timestamp}.json
  • Example: crawl-example.com-2026-06-05-14-30-45.json
  • Compression: With --fileCompression, files are saved as .json.gz and the uncompressed version is removed

Architecture

Core Components

  • AbstractCrawlCommand - Base class providing:

    • Crawl4AI API integration
    • HTTP client handling
    • File output management with optional compression
    • Configurable crawler defaults (locale, timeout, content filters)
  • CrawlSitemapXmlCommand - Implements sitemap-specific crawling:

    • XML sitemap parsing and URL extraction
    • Command-line interface with configurable options
    • JSON output generation with timing statistics

Environment Variables

  • CRAWL4AI_BASE_URL - Base URL for Crawl4AI API service (required)

Example .env file:

CRAWL4AI_BASE_URL=http://localhost:8000

Directory Structure

.
├── bin/
│   └── console           # CLI entrypoint
├── config/               # Symfony configuration
├── crawl/
│   └── output/          # Generated JSON output files
├── public/              # Web root (if web access needed)
├── src/
│   ├── Command/         # Console commands
│   │   ├── AbstractCrawlCommand.php
│   │   └── CrawlSitemapXmlCommand.php
│   └── Kernel.php       # Symfony kernel
└── var/
    └── cache/           # Symfony cache files

Limitations

  • Sitemap-only crawling: This tool only supports XML sitemap-based URL discovery. Direct URL crawling is not supported.
  • Dependency on Crawl4AI: Requires a running Crawl4AI service instance
  • Single-domain operation: Each crawl targets URLs from a single sitemap file

Version: Experimental
Last Updated: June 2026

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: proprietary
  • 更新时间: 2026-06-03

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固