cable8mm/mma-scrapers
最新稳定版本:v0.3.0
Composer 安装命令:
composer require cable8mm/mma-scrapers
包简介
A lightweight, extensible PHP library for scraping MMA data from multiple sources.
关键字:
README 文档
README
A lightweight PHP library for scraping and parsing MMA data into simple DTOs.
Features
- Source-specific scrapers and parsers for MMA websites.
- Normalized DTOs for events, fights, and fighters.
- Fixture-friendly parser design using Symfony DomCrawler.
- Mockable HTTP layer through
HttpClientInterface. - Helper services for fighter matching, Sherdog ID resolution, and fight deduplication.
- No database dependency.
Requirements
- PHP
^8.4 - Composer
Installation
composer require cable8mm/mma-scrapers
For local development:
composer install
Supported Sources
| Source | Events | Event detail | Fights | Fighters | Notes |
|---|---|---|---|---|---|
| BlackCombat | Yes | Yes | Yes | Yes | Official source support |
| Sherdog | No | No | No | Yes | Fighter search and fighter detail support |
| Tapology | No | No | No | No | Planned source |
Core Concepts
The library is organized around a small pipeline:
HTTP client -> Scraper -> Parser -> DTO
Scrapers fetch HTML and delegate extraction to parsers. Parsers are deterministic and return DTOs. Aggregators and services are available when a consuming app needs to compare, merge, or deduplicate parsed results.
Project Structure
src/
Aggregators/ Merge related event, fight, and fighter DTOs
Contracts/ Scraper and HTTP interfaces
DTO/ EventDTO, FightDTO, FighterDTO
Enums/ Source, fight status, fight method, weight class
Http/ Guzzle HTTP client implementation
Matchers/ Fighter matching helpers
Normalizers/ Text-to-enum normalization helpers
Services/ Sherdog ID resolution and fight deduplication
Sources/
BlackCombat/
Parsers/
Scrapers/
Sherdog/
Parsers/
Scrapers/
Usage
Parse BlackCombat Events From HTML
use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseEvents; $html = file_get_contents('blackcombat_events.html'); $parser = new ParseEvents(); $events = $parser($html);
Scrape BlackCombat Events
use Cable8mm\MmaScrapers\Http\DefaultHttpClient; use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseEvents; use Cable8mm\MmaScrapers\Sources\BlackCombat\Scrapers\EventsScraper; $scraper = new EventsScraper( new DefaultHttpClient(), new ParseEvents() ); $events = $scraper->scrape('https://www.blackcombat-official.com/event.php?page=10');
Parse BlackCombat Fights
use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseFights; $html = file_get_contents('event_detail.html'); $parser = new ParseFights(); $fights = $parser($html);
Scrape a Sherdog Fighter
use Cable8mm\MmaScrapers\Http\DefaultHttpClient; use Cable8mm\MmaScrapers\Sources\Sherdog\Parsers\ParseFighter; use Cable8mm\MmaScrapers\Sources\Sherdog\Scrapers\FighterScraper; $scraper = new FighterScraper( new DefaultHttpClient(), new ParseFighter() ); $fighter = $scraper->scrapeById(12345);
Resolve a Sherdog Fighter ID
use Cable8mm\MmaScrapers\Http\DefaultHttpClient; use Cable8mm\MmaScrapers\Services\SherdogIdResolver; use Cable8mm\MmaScrapers\Sources\Sherdog\Parsers\ParseSearchResults; use Cable8mm\MmaScrapers\Sources\Sherdog\Scrapers\SearchFighterScraper; $search = new SearchFighterScraper(new DefaultHttpClient()); $parser = new ParseSearchResults(); $resolver = new SherdogIdResolver(); $html = $search->search('Chan Sung Jung'); $candidates = $parser($html); $sherdogId = $resolver->resolve('Chan Sung Jung', $candidates);
Deduplicate Fights
use Cable8mm\MmaScrapers\Aggregators\FightAggregator; use Cable8mm\MmaScrapers\Aggregators\FighterAggregator; use Cable8mm\MmaScrapers\Services\FightDeduplicator; $deduplicator = new FightDeduplicator( new FightAggregator(new FighterAggregator()) ); $deduplicatedFights = $deduplicator->deduplicate($fights);
DTOs
EventDTO
new EventDTO( name: 'Black Combat 16', location: 'Incheon, South Korea', date: new DateTimeImmutable('2026-01-31'), url: '/eventDetail.php?eventSeq=285', externalId: '285' );
FighterDTO
new FighterDTO( name: 'Chan Sung Jung', nickname: 'The Korean Zombie', instagram: 'koreanzombiemma', teamname: 'Korean Zombie MMA', height: '170cm', win: 17, lose: 8, draw: 0, sherdogId: 36155 );
FightDTO
use Cable8mm\MmaScrapers\Enums\FightMethod; use Cable8mm\MmaScrapers\Enums\FightStatus; use Cable8mm\MmaScrapers\Enums\Source; use Cable8mm\MmaScrapers\Enums\WeightClass; new FightDTO( redFighter: $redFighter, blueFighter: $blueFighter, source: Source::OFFICIAL, status: FightStatus::FINISHED, weightClass: WeightClass::FEATHERWEIGHT, method: FightMethod::KO, round: 1, time: '3:14', winner: $redFighter, fightDate: new DateTimeImmutable('2026-01-31') );
Design Rules
- Keep source implementations isolated under
src/Sources/{SourceName}. - Put HTTP access in scrapers, not parsers.
- Keep parsers deterministic: raw HTML in, DTOs out.
- Test parsers with static HTML fixtures.
- Keep storage, API delivery, and application workflows outside this package.
Development
Run tests:
composer test
Run Pint:
composer lint
Generate API documentation:
composer apidoc
Testing
Parser and scraper tests use HTML fixtures from tests/Fixtures.
$html = file_get_contents(__DIR__.'/../../Fixtures/BlackCombat/event_detail.html'); $parser = new ParseFights(); $fights = $parser($html); $this->assertNotEmpty($fights);
Avoid real HTTP calls in tests. Inject a mocked HttpClientInterface when testing scrapers.
Contributing
- Keep the existing source/parser/scraper boundaries.
- Add or update fixtures for parser changes.
- Add unit tests for new behavior.
- Run
composer testandcomposer lintbefore opening a pull request.
License
MMA Scrapers is open-sourced software licensed under the MIT license.
统计信息
- 总下载量: 44
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 3
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-04-23