定制 cable8mm/mma-scrapers 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

cable8mm/mma-scrapers

最新稳定版本:v0.3.0

Composer 安装命令:

composer require cable8mm/mma-scrapers

包简介

A lightweight, extensible PHP library for scraping MMA data from multiple sources.

README 文档

README

A lightweight PHP library for scraping and parsing MMA data into simple DTOs.

build & tests coding style deploy-to-github-pages update changelog Packagist Dependency Version Packagist Version Packagist Downloads Packagist Stars GitHub License

Features

  • Source-specific scrapers and parsers for MMA websites.
  • Normalized DTOs for events, fights, and fighters.
  • Fixture-friendly parser design using Symfony DomCrawler.
  • Mockable HTTP layer through HttpClientInterface.
  • Helper services for fighter matching, Sherdog ID resolution, and fight deduplication.
  • No database dependency.

Requirements

  • PHP ^8.4
  • Composer

Installation

composer require cable8mm/mma-scrapers

For local development:

composer install

Supported Sources

Source Events Event detail Fights Fighters Notes
BlackCombat Yes Yes Yes Yes Official source support
Sherdog No No No Yes Fighter search and fighter detail support
Tapology No No No No Planned source

Core Concepts

The library is organized around a small pipeline:

HTTP client -> Scraper -> Parser -> DTO

Scrapers fetch HTML and delegate extraction to parsers. Parsers are deterministic and return DTOs. Aggregators and services are available when a consuming app needs to compare, merge, or deduplicate parsed results.

Project Structure

src/
  Aggregators/      Merge related event, fight, and fighter DTOs
  Contracts/        Scraper and HTTP interfaces
  DTO/              EventDTO, FightDTO, FighterDTO
  Enums/            Source, fight status, fight method, weight class
  Http/             Guzzle HTTP client implementation
  Matchers/         Fighter matching helpers
  Normalizers/      Text-to-enum normalization helpers
  Services/         Sherdog ID resolution and fight deduplication
  Sources/
    BlackCombat/
      Parsers/
      Scrapers/
    Sherdog/
      Parsers/
      Scrapers/

Usage

Parse BlackCombat Events From HTML

use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseEvents;

$html = file_get_contents('blackcombat_events.html');

$parser = new ParseEvents();
$events = $parser($html);

Scrape BlackCombat Events

use Cable8mm\MmaScrapers\Http\DefaultHttpClient;
use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseEvents;
use Cable8mm\MmaScrapers\Sources\BlackCombat\Scrapers\EventsScraper;

$scraper = new EventsScraper(
    new DefaultHttpClient(),
    new ParseEvents()
);

$events = $scraper->scrape('https://www.blackcombat-official.com/event.php?page=10');

Parse BlackCombat Fights

use Cable8mm\MmaScrapers\Sources\BlackCombat\Parsers\ParseFights;

$html = file_get_contents('event_detail.html');

$parser = new ParseFights();
$fights = $parser($html);

Scrape a Sherdog Fighter

use Cable8mm\MmaScrapers\Http\DefaultHttpClient;
use Cable8mm\MmaScrapers\Sources\Sherdog\Parsers\ParseFighter;
use Cable8mm\MmaScrapers\Sources\Sherdog\Scrapers\FighterScraper;

$scraper = new FighterScraper(
    new DefaultHttpClient(),
    new ParseFighter()
);

$fighter = $scraper->scrapeById(12345);

Resolve a Sherdog Fighter ID

use Cable8mm\MmaScrapers\Http\DefaultHttpClient;
use Cable8mm\MmaScrapers\Services\SherdogIdResolver;
use Cable8mm\MmaScrapers\Sources\Sherdog\Parsers\ParseSearchResults;
use Cable8mm\MmaScrapers\Sources\Sherdog\Scrapers\SearchFighterScraper;

$search = new SearchFighterScraper(new DefaultHttpClient());
$parser = new ParseSearchResults();
$resolver = new SherdogIdResolver();

$html = $search->search('Chan Sung Jung');
$candidates = $parser($html);

$sherdogId = $resolver->resolve('Chan Sung Jung', $candidates);

Deduplicate Fights

use Cable8mm\MmaScrapers\Aggregators\FightAggregator;
use Cable8mm\MmaScrapers\Aggregators\FighterAggregator;
use Cable8mm\MmaScrapers\Services\FightDeduplicator;

$deduplicator = new FightDeduplicator(
    new FightAggregator(new FighterAggregator())
);

$deduplicatedFights = $deduplicator->deduplicate($fights);

DTOs

EventDTO

new EventDTO(
    name: 'Black Combat 16',
    location: 'Incheon, South Korea',
    date: new DateTimeImmutable('2026-01-31'),
    url: '/eventDetail.php?eventSeq=285',
    externalId: '285'
);

FighterDTO

new FighterDTO(
    name: 'Chan Sung Jung',
    nickname: 'The Korean Zombie',
    instagram: 'koreanzombiemma',
    teamname: 'Korean Zombie MMA',
    height: '170cm',
    win: 17,
    lose: 8,
    draw: 0,
    sherdogId: 36155
);

FightDTO

use Cable8mm\MmaScrapers\Enums\FightMethod;
use Cable8mm\MmaScrapers\Enums\FightStatus;
use Cable8mm\MmaScrapers\Enums\Source;
use Cable8mm\MmaScrapers\Enums\WeightClass;

new FightDTO(
    redFighter: $redFighter,
    blueFighter: $blueFighter,
    source: Source::OFFICIAL,
    status: FightStatus::FINISHED,
    weightClass: WeightClass::FEATHERWEIGHT,
    method: FightMethod::KO,
    round: 1,
    time: '3:14',
    winner: $redFighter,
    fightDate: new DateTimeImmutable('2026-01-31')
);

Design Rules

  • Keep source implementations isolated under src/Sources/{SourceName}.
  • Put HTTP access in scrapers, not parsers.
  • Keep parsers deterministic: raw HTML in, DTOs out.
  • Test parsers with static HTML fixtures.
  • Keep storage, API delivery, and application workflows outside this package.

Development

Run tests:

composer test

Run Pint:

composer lint

Generate API documentation:

composer apidoc

Testing

Parser and scraper tests use HTML fixtures from tests/Fixtures.

$html = file_get_contents(__DIR__.'/../../Fixtures/BlackCombat/event_detail.html');

$parser = new ParseFights();
$fights = $parser($html);

$this->assertNotEmpty($fights);

Avoid real HTTP calls in tests. Inject a mocked HttpClientInterface when testing scrapers.

Contributing

  1. Keep the existing source/parser/scraper boundaries.
  2. Add or update fixtures for parser changes.
  3. Add unit tests for new behavior.
  4. Run composer test and composer lint before opening a pull request.

License

MMA Scrapers is open-sourced software licensed under the MIT license.

统计信息

  • 总下载量: 44
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 3
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: HTML

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-04-23

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固