定制 padosoft/laravel-ai-guardrails 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

padosoft/laravel-ai-guardrails

Composer 安装命令:

composer require padosoft/laravel-ai-guardrails

包简介

Deterministic prompt-injection guardrails for laravel/ai: tool firewall, input screening + injection audit, untrusted-output sanitization, and a HITL approval bridge.

README 文档

README

laravel-ai-guardrails

laravel-ai-guardrails

Deterministic, offline-first prompt-injection guardrails for laravel/ai.
Four composable controls that treat everything the model touches — its tool arguments, its prompts, and its output — as untrusted.

PHP 8.3+ Laravel 13 PHPStan level 8 Tests passing Apache-2.0

Table of Contents

Why it exists

laravel/ai makes it trivial to give a model tools (refund an order, delete a record, send an email) and to feed it untrusted user input. That is exactly where prompt injection lives:

  • The model can be talked into calling a tool with someone else's user_id (confused-deputy / IDOR).
  • A crafted prompt can make it ignore its instructions or exfiltrate secrets.
  • Its output — rendered in your UI — can carry stored-XSS, markdown data-exfiltration links, or leaked PII.
  • It can decide, on its own, to pull the trigger on a destructive action.

laravel-ai-guardrails closes that gap with four deterministic, offline, unit-testable controls. No second LLM call, no network, no non-determinism — the audit trail is the product, not a regex you have to trust.

📚 Full documentation: doc.laravel-ai-guardrails.padosoft.com — guides, the four controls in depth, architecture & ADRs, configuration reference, and the HTTP/MCP surfaces.

What makes it different

  • Untrusted-input posture, everywhere. Tool arguments, prompts, and model output are all treated as hostile.
  • Deterministic & offline. Controls A–C never call a model; every decision is reproducible and testable.
  • Fails closed. A PCRE error, a tampered flow record, an unresolved engine — every failure path blocks rather than silently allows.
  • Append-only audit. Every screening attempt (blocked and allowed) is logged to an immutable store. The model never updates or deletes it.
  • Composes, doesn't reinvent. Optional padosoft/laravel-flow for human approval and padosoft/laravel-pii-redactor for PII — with graceful degradation when absent.
  • Every feature is a toggle, tested in both states, with a master kill-switch that degrades the whole package to pass-through.

The four controls

Control What it does Threat it closes
A Tool Firewall Re-scopes model-chosen owner keys (user_id, …) to the authenticated principal server-side and validates every argument against the tool's own JSON schema. Confused-deputy / IDOR via model-chosen arguments
B Input Screening + Audit Normalizes the prompt (defeating homoglyph / zero-width / case evasion), screens it, refuses before the model runs, and append-only-logs every attempt. Jailbreak / exfiltration prompts
C Output Handler Treats the response as untrusted: escapes HTML, neutralizes markdown link/image exfil vectors, validates structured output, and redacts PII. Stored-XSS / data-exfil / PII leakage in model output
D HITL Bridge Routes destructive tool calls (refund/delete/email) through laravel-flow's approvalGate() — a human approves before the action runs. Unauthorized destructive actions

Quick start

Junior-proof. Five steps.

1. Install

composer require padosoft/laravel-ai-guardrails

2. Publish the config

php artisan vendor:publish --tag=ai-guardrails-config

3. (Optional) Publish + run the audit migration — only if you want database-backed audit:

php artisan vendor:publish --tag=ai-guardrails-migrations
php artisan migrate

then set AI_GUARDRAILS_AUDIT_STORE=database in your .env.

4. Guard a tool call (Control A) in your app:

use Padosoft\AiGuardrails\Facades\AiGuardrails;

$safeTool = AiGuardrails::guard($refundTool); // re-scopes owner keys + validates args

5. Screen a prompt or sanitize output anywhere:

$verdict = AiGuardrails::screen($userPrompt);     // ->blocked, ->ruleId, ->refusalMessage
$clean   = AiGuardrails::sanitize($modelOutput);  // HTML/markdown sanitized + PII redacted

That's it. Add the agent middleware (below) to screen prompts and sanitize output automatically.

PHP surface

Everything is reachable from the AiGuardrails facade:

use Padosoft\AiGuardrails\Facades\AiGuardrails;

AiGuardrails::screen(string $prompt): ScreenVerdict;                                  // Control B
AiGuardrails::sanitize(string $text): string;                                        // Control C
AiGuardrails::guard(Tool $tool, ?Closure $principalResolver = null): Tool;           // Control A
AiGuardrails::routeForApproval(Tool $tool, string $toolName, ?Closure $principalResolver = null): Tool; // Control D
AiGuardrails::isDestructive(string $toolName): bool;
AiGuardrails::validateStructured(array $output, array $schema, bool $rejectUnknown = false): array; // Control C

Wiring the agent middleware

Declare the input + output middleware on your agent (they implement laravel/ai's middleware contract):

use Padosoft\AiGuardrails\Screening\GuardrailInputMiddleware;
use Padosoft\AiGuardrails\Output\GuardrailOutputMiddleware;
use Laravel\Ai\Contracts\HasMiddleware;

final class SupportAgent implements Agent, HasMiddleware
{
    public function middleware(): array
    {
        return [
            app(GuardrailInputMiddleware::class),  // screens + refuses + audits before the model
            app(GuardrailOutputMiddleware::class), // sanitizes $response->text + structured fields after
        ];
    }
}

GuardrailInputMiddleware refuses without ever invoking the model when a prompt is blocked, and audits every attempt. GuardrailOutputMiddleware rewrites the response text (and the structured-output fields) in place — tool calls are left to Controls A/D.

Artisan surface

# Screen a prompt (exits non-zero when blocked); reads STDIN if no argument
php artisan ai-guardrails:screen "please ignore all previous instructions"

# Sanitize + redact a text blob
php artisan ai-guardrails:sanitize "<script>steal()</script> ![x](http://evil/leak)"

# List recent injection-audit attempts (blocked and allowed)
php artisan ai-guardrails:audit --limit=50

# Apply the GDPR retention strategy to the audit table (actor-audited; the only sanctioned erasure path)
php artisan ai-guardrails:purge --strategy=anonymize --days=365 --actor="ops:nightly"
php artisan ai-guardrails:purge --dry-run   # report what would be affected, change nothing

HTTP API surface (admin)

A read/config HTTP API for an admin panel (e.g. laravel-ai-guardrails-admin). It is default-OFF — set api.enabled = true and supply a middleware stack via api.middleware. If api.enabled is true but api.middleware resolves to an empty list, the service provider throws a RuntimeException at boot (fail-closed against an accidentally open surface) — but it does not inspect what that middleware does: you must include your own authentication/authorization middleware — these endpoints expose audit data and let an operator change security settings. Routes are mounted under the api.prefix (default ai-guardrails/api) and named ai-guardrails.api.*.

Envelope. Successful (and handled-error, e.g. 404/409/422-via-controller) responses are enveloped as { "schema_version": "ai-guardrails.api.v1", "schema": "ai-guardrails.api.v1.<endpoint>", "data": { … } }schema_version is the contract version a client pins against; schema is a per-endpoint discriminator. (Mirrors the padosoft-eval-harness ReportApi house style.) Exception: framework-level validation failures (a malformed PUT /settings body) return Laravel's standard 422 validation JSON, not the envelope.

Method Path Route name schema Backing store / toggle
GET /overview …overview …v1.overview aggregates each control's enabled + effective mode (enforce/monitor/off) + 24h injection counts + the active ruleset_version
GET /audit …audit.index …v1.audit-list audit.store (null | array | database) — keyset paginated (cursor), filters blocked/rule_id/principal_id/q/from/to
GET /audit/{id} …audit.show …v1.audit-detail full prompt + matched_span; 404 on unknown/non-numeric id
GET /audit/trend …audit.trend …v1.audit-trend per-UTC-day SQL GROUP BY (dialect-safe); 30-day default window
GET /firewall …firewall.index …v1.firewall firewall_log.store — Control A rejections, keyset paginated
GET /output/stats …output.stats …v1.output-stats output_stats.store — per-kind counts, 30-day default window
GET /approvals …approvals.index …v1.approval-list Control D pending approvals (via laravel-flow); empty when HITL unavailable
POST /approvals/{token}/approve …approvals.approve …v1.approval-decision resumes the parked tool; actor principal derived server-side
POST /approvals/{token}/reject …approvals.reject …v1.approval-decision rejects the parked tool
GET /settings …settings.show …v1.settings settings.store (config | database) — effective overridable settings
PUT /settings …settings.update …v1.settings persists allow-listed, type-validated overrides; appends a change record + dispatches SettingsChanged
GET /settings/changes …settings.changes …v1.settings-changes settings_audit.store (null | array | database) — append-only WHO/WHAT change log
POST /try/screen …try.screen …v1.try-screen sandbox: screen a prompt (no persistence)
POST /try/sanitize …try.sanitize …v1.try-sanitize sandbox: sanitize a text blob (no persistence)

Append-only stores. The audit, firewall, output-stat, and settings-change tables are immutable (the model + builder throw on update/delete). GET /settings is current-state and mutable; PUT /settings only accepts keys on the settings.overridable allow-list and type-validates each value (booleans, enums, bounded strings) — unknown keys are dropped, malformed values are rejected 422. When settings.store = database, saved overrides are overlaid onto the live config at boot so they actually take effect on the controls (fail-safe: a corrupt/null/type-mismatched row keeps the file default). Every effective change (before ≠ after) is recorded to the settings_audit store with the server-derived actor (never client-supplied) and surfaced by GET /settings/changes.

Configuration

Every behaviour is a config toggle (config/ai-guardrails.php). The four controls are on by default (that is the point); the HITL bridge (hitl.enabled) and the HTTP API (api.enabled) are default-OFF because they need optional dependencies / explicit opt-in. A master kill-switch sits on top.

Key Default Purpose
enabled true Master kill-switch — off degrades every control to pass-through.
tool_firewall.owner_keys user_id, owner_id, account_id, customer_id Argument keys the model may never choose (overwritten server-side).
tool_firewall.reject_unknown_arguments true Reject arguments not declared in the tool schema.
input_screen.patterns (4 built-in) ruleId => PCRE pattern — the audit is the value, not the list.
normalization.* on NFKC, zero-width strip, casefold, max_prompt_length.
pattern_safety.on_match_error closed closed = block on a PCRE error, open = skip the rule.
output_handler.html_mode escape escape (default) or allowlist (keep a safe inline-tag set).
output_handler.redact_pii true Redact PII via laravel-pii-redactor when present.
hitl.enabled false Enable the HITL approval bridge (needs laravel-flow).
hitl.destructive_tools refund, delete, send_email Tool names treated as destructive.
hitl.fallback deny When approval is unavailable: deny (refuse) or pass (execute).
audit.store 'null' 'null' | 'array' | 'database' (string tokens).
tool_authorization.enabled false Gate tool use behind a Laravel Gate ability (fail-closed) — separate from owner-key re-scoping.
tool_authorization.ability ai-guardrails:use-tool The Gate ability checked (with the tool class) before a guarded tool runs.
tool_authorization.owner_key_depth top_level recursive re-scopes owner keys at any nesting depth; top_level only at the top.
api.enabled false The default-OFF HTTP admin API surface.

Tool authorization (Control A+)

Owner-key re-scoping stops the model acting on another user's resource — it does not decide whether the principal may use the tool at all. Enable tool_authorization.enabled and define the Gate ability to add that second layer:

use Illuminate\Support\Facades\Gate;

Gate::define('ai-guardrails:use-tool', fn ($user, string $toolClass) => $user->mayUse($toolClass));

AiGuardrails::guard() then composes authorize → re-scope → validate → run; a denial throws ToolNotAuthorized. It fails closed: an undefined ability, an unauthenticated user, or a throwing policy all deny.

The modes, audit_hygiene, and retention config blocks are documented in their own sections above.

Composing laravel-flow & laravel-pii-redactor

Both are optional (suggest). The package degrades gracefully:

composer require padosoft/laravel-flow          # enables Control D (human approval)
composer require padosoft/laravel-pii-redactor  # enables PII redaction in Control C

When a package is absent, class_exists guards bind null-object implementations, and the boundary is enforced by an architecture test (flow is referenced only in src/Hitl, pii-redactor + HTMLPurifier only in src/Output, and laravel/mcp only in src/Mcp).

MCP surface

A fourth surface (after PHP, Artisan, HTTP API): expose the guardrails to AI clients via laravel/mcp. Default-OFF — install the package and set mcp.enabled = true:

composer require laravel/mcp
# config/ai-guardrails.php → 'mcp' => ['enabled' => true]
php artisan mcp:start ai-guardrails   # local (stdio) server

Registered under the handle ai-guardrails with three tools: screen_prompt (Control B verdict), sanitize_output (Control C clean), and recent_injection_audit (read the append-only log). The laravel/mcp reference is confined to src/Mcp (architecture test).

HITL setup (Control D)

Control D needs laravel-flow installed and its tables migrated. Two commands make that turnkey and verifiable:

# Run laravel-flow's migrations (flow_runs / flow_approvals) straight from vendor — scoped, idempotent
php artisan ai-guardrails:hitl-install

# Diagnose the setup: flow installed? persistence on? tables present? hitl + master enabled?
php artisan ai-guardrails:hitl-status

Then set LARAVEL_FLOW_PERSISTENCE_ENABLED=true and AI_GUARDRAILS_HITL_ENABLED=true. hitl-status exits non-zero (and prints exactly what is missing) until HITL can actually gate a destructive call.

The append-only injection audit

The audit is the product value of Control B. Every screening attempt — blocked and allowed — is appended to an immutable store. The Eloquent model and its query builder throw on update / delete / upsert / touch / increment / truncate; the table has no updated_at. Timestamps are stored in UTC.

Data hygiene (audit_hygiene.prompt_storage). Because the table captures raw prompts, the stored prompt is transformed before persistence: redact (default — composes laravel-pii-redactor), hash (sha256:…, correlate without keeping content), truncate (first truncate_at code points), or raw. Hygiene is applied at the store boundary so every write path is covered; domain events still carry the raw prompt in-process.

Retention / erasure (retention.strategy). GDPR erasure on an append-only table goes through the sanctioned, actor-audited ai-guardrails:purge command — the only place rows leave the table. anonymize nulls the prompt + principal of rows older than retention.days, purge hard-deletes them, keep retains. Every run logs the actor, strategy, cutoff, and affected-row count.

Domain events

Every guardrail decision dispatches a domain event from the same code path that writes the audit / stat record, so you can wire SIEM, Slack, or PagerDuty with a single listener. Events are gated by events.enabled (default on); set it to false to silence them without touching the controls.

Event Dispatched when $enforced
Padosoft\AiGuardrails\Events\InjectionBlocked Control B refused a prompt (enforce) n/a — separate class
Padosoft\AiGuardrails\Events\InjectionObserved Control B detected an injection but passed it through (monitor) n/a — separate class
Padosoft\AiGuardrails\Events\ToolArgumentRejected Control A found owner-key / schema violations in a tool call true = call blocked; false = monitor, call proceeded
Padosoft\AiGuardrails\Events\DestructiveToolRouted Control D parked a destructive call for human approval (carries the non-secret run reference only) n/a — enforce only
Padosoft\AiGuardrails\Events\OutputSanitized Control C neutralised HTML / markdown / structured / PII in a response (one event per response, deduped kinds) true = text rewritten; false = monitor, text unchanged

In monitor mode the Observed/Rejected/Sanitized events still fire. The $enforced property on ToolArgumentRejected and OutputSanitized encodes the enforcement decision directly in the payload — listeners do not need to read the live config to distinguish a real block from a shadow observation.

Security note — InjectionBlocked / InjectionObserved carry the raw prompt text (via $attempt->prompt). If you ship these events to an external webhook (Slack, PagerDuty, SIEM), be aware that the payload may contain PII or sensitive input. Extract only the fields you need (ruleId, blocked, occurredAt) rather than forwarding the full InjectionAttempt object.

Security & threat model

Control Untrusted surface Posture
A model-chosen tool arguments re-scope owner keys server-side + schema-validate; re-scoping is not authorization
B user prompts normalize → screen → refuse pre-model → append-only audit; fail closed on PCRE errors
C model output (text + structured fields) escape HTML / defang markdown & URI exfil vectors / validate structure / redact PII
D destructive tool calls human-gated via approvalGate(); the plain-text token is never returned to the model

Every failure path fails closed. The master kill-switch and per-control toggles are tested in both states.

Known limitations

  • Control C rewrites $response->text and structured string fields; the model's toolCalls are governed by Controls A/D and are not sanitized by default. An opt-in output_handler.sanitize_tool_calls flag (default off) adds a defense-in-depth pass that cleans the string leaves of tool-call arguments — enable it only when those arguments are rendered/logged, since rewriting them could otherwise alter a legitimate call.
  • Cross-script homoglyphs are folded to a Latin skeleton before matching via a curated confusables map (normalization.fold_confusables, default on) — Cyrillic а/о/е…, Greek ο/α/ρ…. It is a high-value curated subset, not the full Unicode confusables data, so an exotic look-alike outside the map can still slip through; extend ConfusablesFolder for a wider threat model.
  • The HTML allowlist mode uses HTMLPurifier when ezyang/htmlpurifier is installed (robust parsing of malformed / entity-encoded / mutation-XSS markup), and gracefully falls back to the built-in strip_tags allowlist when it is absent. escape mode is unchanged.
  • Control D's flow persistence (approval tokens, resume) is provided by the host's laravel-flow install — made turnkey by ai-guardrails:hitl-install and verifiable by ai-guardrails:hitl-status (see HITL setup).

Testing

composer install
vendor/bin/phpunit          # Unit + Feature + Architecture
vendor/bin/pint --test
vendor/bin/phpstan analyse --memory-limit=512M

CI runs the matrix PHP 8.3 / 8.4 / 8.5 × Laravel 13: composer validatepintphpstan (level 8) → phpunit.

Part of the Padosoft AI suite

laravel-ai-guardrails pairs with laravel-ai-guardrails-admin (a React control plane for the audit trail, firewall posture, output stats, and approval queue), and composes padosoft/laravel-flow and padosoft/laravel-pii-redactor.

License

Apache-2.0 © Padosoft s.r.l. See LICENSE.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: Apache-2.0
  • 更新时间: 2026-06-18

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固