定制 ismailelbery/laravel-arabic-search 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

ismailelbery/laravel-arabic-search

Composer 安装命令:

composer require ismailelbery/laravel-arabic-search

包简介

Accurate Arabic text search for Laravel over plain MySQL / PostgreSQL / SQLite — no external search engine required.

README 文档

README

Latest Version on Packagist Tests Total Downloads PHP Version License

Arabic search that works on MySQL / PostgreSQL / SQLite — no Elasticsearch, no Meilisearch.

Arabic text is written many ways for the same word: with or without diacritics, أ/إ/آ vs bare ا, ة vs ه, ى vs ي, and Persian/Urdu look-alike letters (ک, ی) that are visually identical to Arabic ones but sit at different Unicode codepoints. Naive LIKE misses all of these. This package normalizes both your stored text and the search term through the same pipeline, so مكة matches مكه, مُحَمَّد matches محمد, and کتاب (Persian keheh) matches كتاب (Arabic kaf).

You declare which columns are searchable once on the model; at query time you pass only the search word.

Article::arabicSearch('اسلام')->paginate(20);   // matches إسلام, الإسلام, ...

Installation

composer require ismailelbery/laravel-arabic-search
php artisan vendor:publish --tag=arabic-search-config
php artisan vendor:publish --tag=arabic-search-migrations

Requirements: PHP 8.1+, Laravel 10 / 11 / 12, and ext-mbstring. ext-intl is optional — if present it adds an NFKC pass that folds Arabic presentation forms from PDFs; a built-in map covers the common cases when it is absent.

Normalization rules

This table is the contract. Each rule is individually toggleable in config/arabic-search.php.

Rule Transform Example Default
strip_invisibles remove zero-width & bidi controls (U+200B–200F, U+061C, U+FEFF, …) احمد‏احمد on
unicode_compatibility NFKC / presentation forms & ligatures لا on
strip_tashkeel remove harakat & Quranic marks (U+064B–065F, U+0670, U+06D6–06ED, …) مُحَمَّدمحمد on
strip_tatweel remove kashida U+0640 محـــمدمحمد on
normalize_alef أ إ آ ٱ ٲ ٵ → ا إسلاماسلام on
normalize_yeh ى (maqsura), ی (Farsi), ے (Urdu) → ي موسى, موسیموسي on
normalize_taa_marbuta ة → ه مكةمكه on
normalize_heh ہ ۀ ە (Urdu/Persian) → ه ہه on
normalize_kaf ک ڪ (Persian/Urdu) → ك کتابكتاب on
normalize_waw ؤ ۆ ۇ ۈ → و مؤمنمومن on
normalize_hamza ؤ → و, ئ → ي قائمقايم on
strip_standalone_hamza ء → (removed) سماءسما off
normalize_dad_zah ظ → ض (tolerant of a common misspelling) ظلضل off
normalize_digits ٠١٢٣ and ۰۱۲۳ (Persian) → 0123 ٢٠٢٥2025 on
lowercase_latin lowercase mixed Latin HeLLohello on
collapse_whitespace runs of whitespace → single, trim محمد محمد on

Design decision — recall over precision. Normalization is intentionally lossy: مكة and مكه will match, by design. Precision is recovered by relevance ordering (exact > prefix > contains), not by being conservative here.

Two rules are off by default because they are lossy across genuinely different words, not just spelling variants of one letter — enable them only if you want that tolerance:

  • strip_standalone_hamza — merges سماء/سما.
  • normalize_dad_zah — folds ظ→ض, so ظلّ (shade) and ضلّ (to go astray) collide. Turn it on when your users frequently confuse the two letters. The toggle applies to both search paths (shadow column and whereArabicVariants) so they stay consistent.

Enable in config/arabic-search.php:

'rules' => [
    'normalize_dad_zah' => true,
],

Changing it changes the normalizer version — run arabic-search:rebuild afterwards for shadow-column tables.

Debug any term end-to-end:

php artisan arabic-search:inspect "مُحَمَّدٌ ٢٠٢٥"

Setup on a model

  1. Add the trait and list your searchable columns:
use IsmailElbery\ArabicSearch\Concerns\HasArabicSearch;

class Article extends Model
{
    use HasArabicSearch;

    protected array $arabicSearchable = ['title', 'body'];
}
  1. Add the shadow columns. Edit the published migration (or write your own using the macro):
Schema::table('articles', function (Blueprint $table) {
    $table->arabicNormalized(['title', 'body']); // adds title_normalized, body_normalized
});
  1. Backfill existing rows:
php artisan arabic-search:rebuild "App\Models\Article"

That's it. New/updated rows keep their shadow columns in sync automatically on save.

How it works

You never search the original column. The package maintains a normalized shadow column next to it (titletitle_normalized). On save, an observer normalizes the source into the shadow column; on search, the term is normalized with the same pipeline and matched against the shadow column. Because both sides run identical PHP normalization, they are guaranteed to agree — there is no SQL-vs-app drift.

articles
├── title              "مُحَمَّدٌ رسولُ الله"   ← original, shown to the user
└── title_normalized   "محمد رسول الله"         ← searched against

⚠️ Bulk writes bypass model events. Model::query()->update(), insert(), upsert() and raw SQL do not fire the observer, so the shadow columns go stale. Run arabic-search:rebuild afterwards.

Standalone normalizer (no model needed)

use IsmailElbery\ArabicSearch\Facades\ArabicSearch;

ArabicSearch::normalize('مُحَمَّدٌ');          // "محمد"
ArabicSearch::tokenize('بسم الله الرحمن');    // ["بسم","الله","الرحمن"]

Searching an existing table with no shadow column

Have a legacy users table you can't (or don't want to) alter? Use variant expansion — it matches every orthographic spelling directly against the raw column, no _normalized column and no rebuild needed:

User::whereArabicVariants('name', 'اسلام')->paginate();
DB::table('users')->whereArabicVariants('name', 'اسلام')->get();

// Two (or more) columns — OR-ed together:
User::whereArabicVariants(['first_name', 'last_name'], 'اسلام')->get();

// Composes with other conditions:
DB::table('docs')->where('pinned', true)
    ->orWhereArabicVariants('title', 'اسلام')->get();

Searching اسلام matches stored اسلام, إسلام, أسلام, آسلام, الإسلام, and diacritized/kashida spellings like إِسْلَام and اســلام — while correctly not matching a different word like اسلم. It works on MySQL, PostgreSQL and SQLite (a PCRE-backed REGEXP function is registered automatically for SQLite).

When to use which:

Shadow column (HasArabicSearch) Variant expansion (whereArabicVariants)
Schema change adds _normalized column none
Backfill arabic-search:rebuild none
Matching LIKE on the normalized column regex on the raw column
Uses an index no in v1 (LIKE infix); fulltext planned no (regex full-scan)
Best for tables you own legacy/read-only tables, small–medium

Configuration highlights

Key Meaning
term_logic and (all tokens must match, default) or or (any)
order_by_relevance exact > prefix > contains ordering (default true)
min_token_length tokens shorter than this are dropped (default 2)
column_suffix shadow-column suffix (default _normalized)
match_mode reserved. v1 always uses like; index-backed fulltext is on the roadmap and not yet wired

Changing any rule changes the normalizer version (ArabicSearch::version()); rerun arabic-search:rebuild so stored data matches.

What this does NOT do (yet)

Naming the limits earns more trust than hiding them:

  • No morphological / root analysis. Searching كتب will not automatically find مكتوب/كاتب. (Root-based matching is a v2 maybe.)
  • No stemming — light prefix/suffix stripping (ال، و، ب، ون، ين) is planned for v1.1, opt-in.
  • No synonym/fuzzy/Levenshtein matching.
  • LIKE infix matches can't use an index — great for small/medium tables. Index-backed fulltext matching is planned but not in v1; for very large datasets today, reach for a dedicated engine.

Use this vs. Meilisearch/Typesense: reach for this when you want correct Arabic matching on the database you already have, with zero extra infrastructure. Reach for a dedicated engine when you need typo-tolerance, faceting, or sub-10ms search over millions of rows.

Testing

composer install
vendor/bin/phpunit

The suite leads with an input → expected-output table (NormalizationTest) plus idempotency checks and an integration SearchTest against in-memory SQLite.

License

MIT.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-07-03

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固