webscraping-ai/webscraping-ai-php
Composer 安装命令:
composer require webscraping-ai/webscraping-ai-php
包简介
Official PHP client for the WebScraping.AI API — LLM-powered web scraping with rotating proxies and Chromium JavaScript rendering.
README 文档
README
Official PHP client for the WebScraping.AI API.
The API gives you LLM-powered scraping tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing — full HTML, visible text, selected page areas, AI-extracted fields, and free-form question answering over any URL.
Requirements
- PHP 8.2 or newer
- A PSR-18 HTTP client — Guzzle, Symfony HttpClient, or any other implementation
- A PSR-17 message factory
If you don't already have these installed, the simplest pair is:
composer require guzzlehttp/guzzle nyholm/psr7
php-http/discovery (a transitive dependency) will pick them up automatically.
Installation
composer require webscraping-ai/webscraping-ai-php
Quick start
use WebScrapingAI\Client; $client = new Client(apiKey: getenv('WEBSCRAPING_AI_KEY')); // Full HTML $html = $client->html(url: 'https://example.com'); // Visible text $text = $client->text(url: 'https://example.com'); // HTML for one selector $h1 = $client->selected(url: 'https://example.com', selector: 'h1'); // HTML for multiple selectors (returns array) $chunks = $client->selectedMultiple( url: 'https://example.com', selectors: ['h1', 'p', 'a'], ); // LLM question over a page $answer = $client->question( url: 'https://example.com', question: 'What is the main topic?', ); // LLM-extracted structured fields $fields = $client->fields( url: 'https://example.com', fields: [ 'title' => 'Main product title', 'price' => 'Current price', ], ); // Account quota $account = $client->account();
All optional parameters (headers, timeout, js, js_timeout, wait_for, proxy, country, custom_proxy, device, error_on_404, error_on_redirect, js_script, …) are PHP named arguments. See the API docs for the full parameter reference.
Bring your own HTTP client
By default, the client builds its own transport. If Guzzle is installed it is used with a request deadline applied (see Timeouts); otherwise php-http/discovery resolves whatever PSR-18 client is installed. To pin a specific client, pass it explicitly:
use GuzzleHttp\Client as Guzzle; use Nyholm\Psr7\Factory\Psr17Factory; use WebScrapingAI\Client; $factory = new Psr17Factory(); $client = new Client( apiKey: getenv('WEBSCRAPING_AI_KEY'), httpClient: new Guzzle(['timeout' => 30.0]), requestFactory: $factory, uriFactory: $factory, );
Injecting your own client opts out of the default deadline — configure transport-level timeouts on the client you pass.
Timeouts
Two different timeouts are in play, and they're easy to confuse:
- The
timeoutparameter accepted by each endpoint method (html(),text(), …) controls server-side page retrieval — how long the API waits for the target page. It does not bound how long your HTTP client waits. - The transport timeout bounds how long the PSR-18 client itself will wait on TCP connect and on reading the response body, so a stalled connection can't hang your process forever.
By default the client applies a transport deadline when it builds its own client and Guzzle is available: a total request timeout of Client::DEFAULT_TIMEOUT (60s) and a TCP connect timeout of Client::DEFAULT_CONNECT_TIMEOUT (10s). Override them via the constructor:
$client = new Client( apiKey: getenv('WEBSCRAPING_AI_KEY'), timeout: 120.0, // total request deadline, seconds connectTimeout: 5.0, // TCP connect deadline, seconds );
These constructor timeouts apply only to the auto-built default client. If you inject your own httpClient, or no concrete client (Guzzle) is available and discovery falls back to an unknown PSR-18 implementation, no deadline is imposed — in that case inject a client with timeouts configured to get one.
Errors
The client raises typed exceptions for every documented status code:
| Status | Exception |
|---|---|
| 400 | WebScrapingAI\Exception\BadRequestException |
| 402 | WebScrapingAI\Exception\PaymentRequiredException |
| 403 | WebScrapingAI\Exception\AuthenticationException |
| 429 | WebScrapingAI\Exception\RateLimitException |
| 500 | WebScrapingAI\Exception\ServerException |
| 504 | WebScrapingAI\Exception\GatewayTimeoutException |
All inherit from WebScrapingAI\Exception\ApiException, which exposes $message, $status, $statusCode, $statusMessage, $body, and $responseBody. The latter three are populated when the API surfaces target-page errors as 500s.
Transport-level failures raise WebScrapingAI\Exception\ApiTimeoutException (the PSR-18 client timed out) or WebScrapingAI\Exception\ApiConnectionException (DNS / connection refused / TLS).
All SDK-originated exceptions implement the marker interface WebScrapingAI\Exception\WebScrapingAIException, so a single catch (WebScrapingAIException $e) block catches everything.
Response shapes
The client returns whatever the API returns — it does not normalise or unwrap. A couple of current quirks worth knowing:
fields()returns['result' => [...fields...]](the live API wraps the extracted fields under aresultkey).selectedMultiple()returnsarray<int, array<int, string>>— an outer wrapper containing all matched chunks concatenated.
These are upstream spec/server drifts; the official Ruby and Python clients return the same shapes.
Migration from 3.x
3.x was generated from the OpenAPI spec under the namespace OpenAPI\Client\ and used per-tag classes (AIApi, HTMLApi, etc.). 4.0 is a hand-authored rewrite with a single WebScrapingAI\Client entry point. There are no deprecation shims — pin to ^3.2 if you need the old surface.
Development
composer install composer test # PHPUnit composer lint # php-cs-fixer (dry-run) composer analyse # PHPStan
License
MIT — see LICENSE.
统计信息
- 总下载量: 3
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 5
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2020-03-27