webscraping-ai/webscraping-ai-php 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

webscraping-ai/webscraping-ai-php

Composer 安装命令:

composer require webscraping-ai/webscraping-ai-php

包简介

Official PHP client for the WebScraping.AI API — LLM-powered web scraping with rotating proxies and Chromium JavaScript rendering.

README 文档

README

Packagist Version CI

Official PHP client for the WebScraping.AI API.

The API gives you LLM-powered scraping tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing — full HTML, visible text, selected page areas, AI-extracted fields, and free-form question answering over any URL.

Requirements

If you don't already have these installed, the simplest pair is:

composer require guzzlehttp/guzzle nyholm/psr7

php-http/discovery (a transitive dependency) will pick them up automatically.

Installation

composer require webscraping-ai/webscraping-ai-php

Quick start

use WebScrapingAI\Client;

$client = new Client(apiKey: getenv('WEBSCRAPING_AI_KEY'));

// Full HTML
$html = $client->html(url: 'https://example.com');

// Visible text
$text = $client->text(url: 'https://example.com');

// HTML for one selector
$h1 = $client->selected(url: 'https://example.com', selector: 'h1');

// HTML for multiple selectors (returns array)
$chunks = $client->selectedMultiple(
    url: 'https://example.com',
    selectors: ['h1', 'p', 'a'],
);

// LLM question over a page
$answer = $client->question(
    url: 'https://example.com',
    question: 'What is the main topic?',
);

// LLM-extracted structured fields
$fields = $client->fields(
    url: 'https://example.com',
    fields: [
        'title' => 'Main product title',
        'price' => 'Current price',
    ],
);

// Account quota
$account = $client->account();

All optional parameters (headers, timeout, js, js_timeout, wait_for, proxy, country, custom_proxy, device, error_on_404, error_on_redirect, js_script, …) are PHP named arguments. See the API docs for the full parameter reference.

Bring your own HTTP client

By default, the client builds its own transport. If Guzzle is installed it is used with a request deadline applied (see Timeouts); otherwise php-http/discovery resolves whatever PSR-18 client is installed. To pin a specific client, pass it explicitly:

use GuzzleHttp\Client as Guzzle;
use Nyholm\Psr7\Factory\Psr17Factory;
use WebScrapingAI\Client;

$factory = new Psr17Factory();
$client = new Client(
    apiKey: getenv('WEBSCRAPING_AI_KEY'),
    httpClient: new Guzzle(['timeout' => 30.0]),
    requestFactory: $factory,
    uriFactory: $factory,
);

Injecting your own client opts out of the default deadline — configure transport-level timeouts on the client you pass.

Timeouts

Two different timeouts are in play, and they're easy to confuse:

  • The timeout parameter accepted by each endpoint method (html(), text(), …) controls server-side page retrieval — how long the API waits for the target page. It does not bound how long your HTTP client waits.
  • The transport timeout bounds how long the PSR-18 client itself will wait on TCP connect and on reading the response body, so a stalled connection can't hang your process forever.

By default the client applies a transport deadline when it builds its own client and Guzzle is available: a total request timeout of Client::DEFAULT_TIMEOUT (60s) and a TCP connect timeout of Client::DEFAULT_CONNECT_TIMEOUT (10s). Override them via the constructor:

$client = new Client(
    apiKey: getenv('WEBSCRAPING_AI_KEY'),
    timeout: 120.0,        // total request deadline, seconds
    connectTimeout: 5.0,   // TCP connect deadline, seconds
);

These constructor timeouts apply only to the auto-built default client. If you inject your own httpClient, or no concrete client (Guzzle) is available and discovery falls back to an unknown PSR-18 implementation, no deadline is imposed — in that case inject a client with timeouts configured to get one.

Errors

The client raises typed exceptions for every documented status code:

Status Exception
400 WebScrapingAI\Exception\BadRequestException
402 WebScrapingAI\Exception\PaymentRequiredException
403 WebScrapingAI\Exception\AuthenticationException
429 WebScrapingAI\Exception\RateLimitException
500 WebScrapingAI\Exception\ServerException
504 WebScrapingAI\Exception\GatewayTimeoutException

All inherit from WebScrapingAI\Exception\ApiException, which exposes $message, $status, $statusCode, $statusMessage, $body, and $responseBody. The latter three are populated when the API surfaces target-page errors as 500s.

Transport-level failures raise WebScrapingAI\Exception\ApiTimeoutException (the PSR-18 client timed out) or WebScrapingAI\Exception\ApiConnectionException (DNS / connection refused / TLS).

All SDK-originated exceptions implement the marker interface WebScrapingAI\Exception\WebScrapingAIException, so a single catch (WebScrapingAIException $e) block catches everything.

Response shapes

The client returns whatever the API returns — it does not normalise or unwrap. A couple of current quirks worth knowing:

  • fields() returns ['result' => [...fields...]] (the live API wraps the extracted fields under a result key).
  • selectedMultiple() returns array<int, array<int, string>> — an outer wrapper containing all matched chunks concatenated.

These are upstream spec/server drifts; the official Ruby and Python clients return the same shapes.

Migration from 3.x

3.x was generated from the OpenAPI spec under the namespace OpenAPI\Client\ and used per-tag classes (AIApi, HTMLApi, etc.). 4.0 is a hand-authored rewrite with a single WebScrapingAI\Client entry point. There are no deprecation shims — pin to ^3.2 if you need the old surface.

Development

composer install
composer test       # PHPUnit
composer lint       # php-cs-fixer (dry-run)
composer analyse    # PHPStan

License

MIT — see LICENSE.

统计信息

  • 总下载量: 3
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 5
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 5
  • Watchers: 2
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2020-03-27

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固