ismailelbery/laravel-arabic-search
Composer 安装命令:
composer require ismailelbery/laravel-arabic-search
包简介
Accurate Arabic text search for Laravel over plain MySQL / PostgreSQL / SQLite — no external search engine required.
README 文档
README
Arabic search that works on MySQL / PostgreSQL / SQLite — no Elasticsearch, no Meilisearch.
Arabic text is written many ways for the same word: with or without diacritics, أ/إ/آ vs bare ا, ة vs ه, ى vs ي, and Persian/Urdu look-alike letters (ک, ی) that are visually identical to Arabic ones but sit at different Unicode codepoints. Naive LIKE misses all of these. This package normalizes both your stored text and the search term through the same pipeline, so مكة matches مكه, مُحَمَّد matches محمد, and کتاب (Persian keheh) matches كتاب (Arabic kaf).
You declare which columns are searchable once on the model; at query time you pass only the search word.
Article::arabicSearch('اسلام')->paginate(20); // matches إسلام, الإسلام, ...
Installation
composer require ismailelbery/laravel-arabic-search php artisan vendor:publish --tag=arabic-search-config php artisan vendor:publish --tag=arabic-search-migrations
Requirements: PHP 8.1+, Laravel 10 / 11 / 12, and ext-mbstring. ext-intl is optional — if present it adds an NFKC pass that folds Arabic presentation forms from PDFs; a built-in map covers the common cases when it is absent.
Normalization rules
This table is the contract. Each rule is individually toggleable in config/arabic-search.php.
| Rule | Transform | Example | Default |
|---|---|---|---|
strip_invisibles |
remove zero-width & bidi controls (U+200B–200F, U+061C, U+FEFF, …) | احمد → احمد |
on |
unicode_compatibility |
NFKC / presentation forms & ligatures | ﻻ → لا |
on |
strip_tashkeel |
remove harakat & Quranic marks (U+064B–065F, U+0670, U+06D6–06ED, …) | مُحَمَّد → محمد |
on |
strip_tatweel |
remove kashida U+0640 | محـــمد → محمد |
on |
normalize_alef |
أ إ آ ٱ ٲ ٵ → ا | إسلام → اسلام |
on |
normalize_yeh |
ى (maqsura), ی (Farsi), ے (Urdu) → ي | موسى, موسی → موسي |
on |
normalize_taa_marbuta |
ة → ه | مكة → مكه |
on |
normalize_heh |
ہ ۀ ە (Urdu/Persian) → ه | ہ → ه |
on |
normalize_kaf |
ک ڪ (Persian/Urdu) → ك | کتاب → كتاب |
on |
normalize_waw |
ؤ ۆ ۇ ۈ → و | مؤمن → مومن |
on |
normalize_hamza |
ؤ → و, ئ → ي | قائم → قايم |
on |
strip_standalone_hamza |
ء → (removed) | سماء → سما |
off |
normalize_dad_zah |
ظ → ض (tolerant of a common misspelling) | ظل → ضل |
off |
normalize_digits |
٠١٢٣ and ۰۱۲۳ (Persian) → 0123 | ٢٠٢٥ → 2025 |
on |
lowercase_latin |
lowercase mixed Latin | HeLLo → hello |
on |
collapse_whitespace |
runs of whitespace → single, trim | محمد → محمد |
on |
Design decision — recall over precision. Normalization is intentionally lossy: مكة and مكه will match, by design. Precision is recovered by relevance ordering (exact > prefix > contains), not by being conservative here.
Two rules are off by default because they are lossy across genuinely different words, not just spelling variants of one letter — enable them only if you want that tolerance:
strip_standalone_hamza— mergesسماء/سما.normalize_dad_zah— folds ظ→ض, soظلّ(shade) andضلّ(to go astray) collide. Turn it on when your users frequently confuse the two letters. The toggle applies to both search paths (shadow column andwhereArabicVariants) so they stay consistent.
Enable in config/arabic-search.php:
'rules' => [ 'normalize_dad_zah' => true, ],
Changing it changes the normalizer version — run arabic-search:rebuild afterwards for shadow-column tables.
Debug any term end-to-end:
php artisan arabic-search:inspect "مُحَمَّدٌ ٢٠٢٥"
Setup on a model
- Add the trait and list your searchable columns:
use IsmailElbery\ArabicSearch\Concerns\HasArabicSearch; class Article extends Model { use HasArabicSearch; protected array $arabicSearchable = ['title', 'body']; }
- Add the shadow columns. Edit the published migration (or write your own using the macro):
Schema::table('articles', function (Blueprint $table) { $table->arabicNormalized(['title', 'body']); // adds title_normalized, body_normalized });
- Backfill existing rows:
php artisan arabic-search:rebuild "App\Models\Article"
That's it. New/updated rows keep their shadow columns in sync automatically on save.
How it works
You never search the original column. The package maintains a normalized shadow column next to it (title → title_normalized). On save, an observer normalizes the source into the shadow column; on search, the term is normalized with the same pipeline and matched against the shadow column. Because both sides run identical PHP normalization, they are guaranteed to agree — there is no SQL-vs-app drift.
articles
├── title "مُحَمَّدٌ رسولُ الله" ← original, shown to the user
└── title_normalized "محمد رسول الله" ← searched against
⚠️ Bulk writes bypass model events.
Model::query()->update(),insert(),upsert()and raw SQL do not fire the observer, so the shadow columns go stale. Runarabic-search:rebuildafterwards.
Standalone normalizer (no model needed)
use IsmailElbery\ArabicSearch\Facades\ArabicSearch; ArabicSearch::normalize('مُحَمَّدٌ'); // "محمد" ArabicSearch::tokenize('بسم الله الرحمن'); // ["بسم","الله","الرحمن"]
Searching an existing table with no shadow column
Have a legacy users table you can't (or don't want to) alter? Use variant
expansion — it matches every orthographic spelling directly against the raw
column, no _normalized column and no rebuild needed:
User::whereArabicVariants('name', 'اسلام')->paginate(); DB::table('users')->whereArabicVariants('name', 'اسلام')->get(); // Two (or more) columns — OR-ed together: User::whereArabicVariants(['first_name', 'last_name'], 'اسلام')->get(); // Composes with other conditions: DB::table('docs')->where('pinned', true) ->orWhereArabicVariants('title', 'اسلام')->get();
Searching اسلام matches stored اسلام, إسلام, أسلام, آسلام, الإسلام,
and diacritized/kashida spellings like إِسْلَام and اســلام — while correctly
not matching a different word like اسلم. It works on MySQL, PostgreSQL and
SQLite (a PCRE-backed REGEXP function is registered automatically for SQLite).
When to use which:
Shadow column (HasArabicSearch) |
Variant expansion (whereArabicVariants) |
|
|---|---|---|
| Schema change | adds _normalized column |
none |
| Backfill | arabic-search:rebuild |
none |
| Matching | LIKE on the normalized column |
regex on the raw column |
| Uses an index | no in v1 (LIKE infix); fulltext planned |
no (regex full-scan) |
| Best for | tables you own | legacy/read-only tables, small–medium |
Configuration highlights
| Key | Meaning |
|---|---|
term_logic |
and (all tokens must match, default) or or (any) |
order_by_relevance |
exact > prefix > contains ordering (default true) |
min_token_length |
tokens shorter than this are dropped (default 2) |
column_suffix |
shadow-column suffix (default _normalized) |
match_mode |
reserved. v1 always uses like; index-backed fulltext is on the roadmap and not yet wired |
Changing any rule changes the normalizer version (ArabicSearch::version()); rerun arabic-search:rebuild so stored data matches.
What this does NOT do (yet)
Naming the limits earns more trust than hiding them:
- No morphological / root analysis. Searching
كتبwill not automatically findمكتوب/كاتب. (Root-based matching is a v2 maybe.) - No stemming — light prefix/suffix stripping (ال، و، ب، ون، ين) is planned for v1.1, opt-in.
- No synonym/fuzzy/Levenshtein matching.
LIKEinfix matches can't use an index — great for small/medium tables. Index-backedfulltextmatching is planned but not in v1; for very large datasets today, reach for a dedicated engine.
Use this vs. Meilisearch/Typesense: reach for this when you want correct Arabic matching on the database you already have, with zero extra infrastructure. Reach for a dedicated engine when you need typo-tolerance, faceting, or sub-10ms search over millions of rows.
Testing
composer install vendor/bin/phpunit
The suite leads with an input → expected-output table (NormalizationTest) plus idempotency checks and an integration SearchTest against in-memory SQLite.
License
MIT.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-07-03