hichxm/hyperloglog
Composer 安装命令:
composer require hichxm/hyperloglog
包简介
A PHP implementation of HyperLogLog algorithm
README 文档
README
A lightweight and dependency-free PHP implementation of the HyperLogLog probabilistic data structure for approximate cardinality estimation.
HyperLogLog allows you to estimate the number of distinct elements in very large datasets while using only a small, fixed amount of memory.
Features
- 🚀 Fast approximate distinct counting
- 📦 Zero dependencies
- 🔧 Configurable number of registers (
counterBits) - 🔐 Configurable hashing algorithm (
xxh3,sha256,md5, etc.) - 📊 Theoretical error rate calculation
- 🧮 Small and large cardinality bias corrections
- ✅ Strict types and fully documented source code
Requirements
- PHP 8.0, 8.1, 8.2, 8.3, 8.4 and 8.5
- No external dependencies
Note: The
xxh3andxxh128hash algorithms are available only when supported by your PHP version and build. If unavailable, you can use any other algorithm returned byhash_algos(), such assha256,sha512, ormd5.
Installation
Install via Composer:
composer require hichxm/hyperloglog
Basic Usage
<?php use Hichxm\HyperLogLog\HyperLogLog; $hll = new HyperLogLog(); $hll->add('apple'); $hll->add('banana'); $hll->add('orange'); $hll->add('apple'); // duplicate echo $hll->count();
The returned value is an approximation of the number of unique elements.
Constructor
new HyperLogLog( int $counterBits = 5, string $hashAlgorithm = 'xxh3' );
Parameters
| Parameter | Description |
|---|---|
counterBits |
Number of bits used to select registers. The number of registers is 2^counterBits. |
hashAlgorithm |
Any hashing algorithm supported by PHP's hash() function. |
Increasing
counterBitsimproves accuracy while increasing memory usage.
Example:
$hll = new HyperLogLog( counterBits: 10, hashAlgorithm: 'sha256' );
Accuracy
The theoretical standard error is:
1.04 / √m
where:
m = 2^counterBits
Example:
$error = $hll->theoreticalErrorRate($hll->getM());
Supported Hash Algorithms
Any algorithm supported by PHP can be used.
Examples include:
xxh3(recommended when available)xxh128sha256sha512md5sha1
You can list available algorithms using:
print_r(hash_algos());
Public API
Add an element
$hll->add('user-123');
Estimate the number of distinct values
$count = $hll->count();
Calculate the theoretical error
$error = $hll->theoreticalErrorRate($hll->getM());
Measure estimation error
$error = $hll->measureError($estimated, $actual);
When to Use HyperLogLog
HyperLogLog is well suited for:
- Counting unique visitors
- Counting unique IP addresses
- Analytics pipelines
- Large log processing
- Stream processing
- Database statistics
- Big data applications
It is not appropriate when an exact distinct count is required.
References
License
MIT
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-07-02