glueful/import-export
最新稳定版本:v1.0.0
Composer 安装命令:
composer require glueful/import-export
包简介
Import and export engine for Glueful apps.
README 文档
README
Overview
Import Export is a general import/export engine for Glueful applications. It owns the machinery every bulk data flow needs -- jobs, deterministic batches, queue dispatch, claiming, progress roll-up, row errors, reports, retries, and management APIs -- while knowing nothing about what the records mean.
Domain meaning lives in adapters. Your app (a CMS, a commerce back office, a CRM)
implements ImporterInterface / ExporterInterface for its own record types and
registers them through service tags. The engine never parses your content model, never
validates your prices, and never decides what a "post" is; it just runs the job safely.
- Engine (this package): job/batch/file/error/report persistence, queue-backed batch processing, atomic batch claiming with stale-lock reclaim, never-throw queue jobs, explicit engine-owned retry, error caps, lifecycle events, HTTP + CLI management, streaming file helpers, ZIP-slip protection.
- Adapters (your code): what a record is, how to read it from a source, how to write it to your domain, what counts as a row error.
Features
- Importer and exporter registries collected through service tags (both tag forms).
- Job tracking: jobs, batches, files, capped row errors, and reports.
- Queue-backed processing with deterministic, adapter-planned batches.
- Conditional-UPDATE batch claiming with stale-lock reclaim.
- Never-throw queue jobs: adapter exceptions mark work failed instead of triggering queue auto-redelivery.
- Explicit, engine-owned retry restricted to adapters that declare themselves retryable.
- Dry-run and commit import modes.
- Seven lifecycle events (job created/started/completed/failed/cancelled, batch completed/failed).
- Streaming CSV, JSON, NDJSON, and ZIP bundle readers/writers.
- ZIP-slip protection via
PathGuard(hostile-archive tested). - Fail-closed permission gating on every HTTP route.
- HTTP management API and CLI commands.
- Failed-record export service and tmp-file retention cleanup.
Installation
Install via Composer:
composer require glueful/import-export
# Rebuild the extensions cache after adding new packages
php glueful extensions:cache
Composer discovers packages of type glueful-extension, but installing does not
auto-enable them. Enable the provider and run migrations:
php glueful extensions:enable import-export php glueful extensions:cache php glueful migrate:run
Local Development Installation
Register the extension as a Composer path repository in your app's composer.json,
then require and enable it:
"repositories": [ { "type": "path", "url": "extensions/import-export", "options": { "symlink": true } } ]
composer require glueful/import-export:@dev php glueful extensions:enable import-export php glueful migrate:run
Verify Installation
php glueful extensions:list php glueful extensions:info import-export php glueful extensions:diagnose
Post-install checklist:
- Run migrations (five
import_export_*tables). - Register at least one importer or exporter adapter.
- Confirm the adapter appears in
GET /import-export/adapters. - Confirm queue workers are running for the configured queue.
Writing an Adapter
Importer Contract
Implement Glueful\Extensions\ImportExport\Contracts\ImporterInterface:
| Method | Responsibility |
|---|---|
key(): string |
Stable machine key (used in API calls and job rows). |
label(): string |
Human-readable label for adapter listings. |
supports(ImportSource $source): bool |
Whether this source (disk, path, MIME type, metadata) can be imported. |
plan(ImportSource $source, ImportOptions $options): ImportPlan |
Inspect the source and return totalRecords, a deterministic list of ImportBatch windows (uuid, sequence, offset, limit), and whether the adapter is retryable. |
process(ImportBatch $batch, ImportContext $context): ImportBatchResult |
Handle one claimed batch window and return processed/failed counts plus row errors. |
Exporter Contract
Implement Glueful\Extensions\ImportExport\Contracts\ExporterInterface:
| Method | Responsibility |
|---|---|
key(): string / label(): string |
As above. |
plan(ExportOptions $options): ExportPlan |
Return totalRecords, deterministic ExportBatch windows, and retryability. format, filters, and options are delivered here. |
process(ExportBatch $batch, ExportContext $context): ExportBatchResult |
Handle one claimed batch and return counts, errors, and optionally a resultPath (recorded as a result file on the job). |
What process() Actually Receives
Be aware of what survives the queue round-trip. The engine persists only the batch
window (uuid, sequence, offset, limit) and the job row. At process time:
ImportContextcarriesjobUuid,mode(dry_run/commit), andactorUuid. Itsoptionsarray is currently always empty --ImportOptions::optionsreachesplan()only.ExportContextcarriesjobUuidandactorUuid; itsformatis currently fixed tondjsonregardless of the requested format, andExportOptions::filters/optionsreachplan()only.ImportBatch/ExportBatchmetadatafrom your plan is not persisted and is empty at process time.
Adapters that need plan-time options, filters, or formats during process() must carry
them themselves (for example, encode them in a sidecar file, a domain table, or derive
them from the source again).
Registration via Service Tags
Tag your adapter services with import_export.importer or import_export.exporter in
your extension's (or app provider's) services() definition. Both tag forms work:
public static function services(): array { return [ // Plain-string tag form App\Imports\ProductsImporter::class => [ 'class' => App\Imports\ProductsImporter::class, 'shared' => true, 'autowire' => true, 'tags' => ['import_export.importer'], ], // Object tag form with priority App\Exports\ProductsExporter::class => [ 'class' => App\Exports\ProductsExporter::class, 'shared' => true, 'autowire' => true, 'tags' => [ ['name' => 'import_export.exporter', 'priority' => 10], ], ], ]; }
Adapter keys must be unique; the registry rejects duplicate keys at construction.
Adapters should not create jobs, mutate import_export_* tables directly, dispatch
queue jobs, or decide global retry behavior -- that is the engine's job.
Retryability and Idempotency Contract
Retry is explicit and engine-owned. To opt in, implement
Glueful\Extensions\ImportExport\Contracts\RetryableAdapterInterface and return true
from retryable().
The contract: retry re-delivers the whole batch window. When a job is retried, every
failed batch is reset to pending and pushed again in full -- including records that may
already have been applied before the batch failed midway. Retryable adapters therefore
MUST apply records idempotently: upsert by a stable source key (an external id, a slug,
a checksum), or detect and skip already-applied records.
If your adapter cannot make process() idempotent per batch, do not implement the
retry capability; the engine will refuse explicit retries for it.
A Lemma Adapter Sketch
As a motivating example, a CMS like Lemma would ship its own adapter set in its own package -- the engine stays domain-blind:
- a WordPress importer (key e.g.
lemma.wordpress) that plans batches over a WXR archive and upserts posts by source GUID (retryable), - a Markdown bundle importer/exporter (
lemma.markdown) over a ZIP of front-mattered files, keyed by path, - a CSV content exporter (
lemma.csv) for spreadsheet round-trips.
Those adapters, their keys, and their mappings belong to Lemma; this package only runs them.
Job Lifecycle
Statuses
pending -> planning -> queued -> running -> completed | failed | cancelled, with
failed -> queued reachable only through explicit retry. Transitions are validated;
invalid transitions are rejected (HTTP 422 on cancel).
Creation
createImport() verifies supports(), calls the adapter's plan(), persists the job
(+ a source file row for imports), persists one batch row per planned batch, pushes
one queue job per batch onto the configured queue, and dispatches
ImportExportJobCreated. Imports default to dry_run mode; exports always run in
commit mode.
Dry-Run vs Commit
The import mode is persisted on the job and delivered to process() through
ImportContext::mode. In dry_run, adapters must validate and count but not write
domain data; row errors are recorded either way, so a dry run doubles as a validation
report.
Batch Claiming and Stale-Lock Reclaim
A worker claims a batch with a single conditional UPDATE that flips it to running,
always sets a fresh locked_at, increments attempts, and stamps started_at. The
claim succeeds for a pending batch, or for a running batch whose locked_at is
older than the stale window (currently fixed at 15 minutes) -- so a batch orphaned by a
crashed worker is reclaimed instead of stuck. Losing claimants exit cleanly.
Never-Throw Queue Jobs
ProcessImportBatchJob / ProcessExportBatchJob run with getMaxAttempts() = 1 and
shouldRetry() = false, and handle() never lets an exception escape. An adapter
exception inside a claimed batch marks the batch failed, records an adapter_exception
row error, dispatches ImportExportBatchFailed, rolls the job up, and returns cleanly.
Queue auto-redelivery is deliberately not the retry policy, because re-delivering a
half-applied batch to a non-idempotent adapter would duplicate records.
Roll-Up and Completion
After each batch finishes, the engine sums batch counters into the job. When no batch is
left pending/running, the job transitions to completed (no failed records) or
failed, dispatching ImportExportJobCompleted / ImportExportJobFailed.
Cancellation
Cancel transitions the job to cancelled and dispatches ImportExportJobCancelled.
Cancellation is observed at batch boundaries: queued batches check job status before
claiming and exit; a batch already in flight finishes its current run.
Retry
POST /jobs/{uuid}/retry, import-export:retry, or RetryService::retry() resets each
failed batch (pending, locks and timestamps cleared) and re-queues it, then moves the
job back to queued. Retry is refused unless the adapter implements
RetryableAdapterInterface and reports retryable() === true.
HTTP API
Routes are mounted under /import-export when routes_enabled is true. All routes
require auth plus the listed permission (fail-closed).
| Method | Path | Permission | Description |
|---|---|---|---|
| GET | /import-export/adapters |
import_export.view |
List registered importer/exporter adapters. |
| POST | /import-export/imports |
import_export.run_import |
Create + queue an import job (adapter, path required; disk, mime_type, metadata, mode, batch_size, options). |
| POST | /import-export/exports |
import_export.run_export |
Create + queue an export job (adapter required; format, batch_size, filters, options). |
| GET | /import-export/jobs |
import_export.view |
List jobs; query params type, status, limit (1-200, default 50). |
| GET | /import-export/jobs/{uuid} |
import_export.view |
One job with its batches. |
| GET | /import-export/jobs/{uuid}/errors |
import_export.view |
Stored row errors for a job. |
| GET | /import-export/jobs/{uuid}/report |
import_export.view |
Latest report (built on demand if absent). |
| POST | /import-export/jobs/{uuid}/cancel |
import_export.cancel |
Cancel a job (422 on invalid transition). |
| POST | /import-export/jobs/{uuid}/retry |
import_export.retry |
Re-queue failed batches of a retryable job. |
CLI
| Command | Description |
|---|---|
import:run --adapter= --path= [--disk=uploads] [--mime-type=] [--mode=dry_run] [--batch-size=500] [--actor=] [--options=JSON] |
Create and queue an import job. |
export:run --adapter= [--format=ndjson] [--batch-size=500] [--actor=] [--filters=JSON] [--options=JSON] |
Create and queue an export job. |
import:list [--status=] [--limit=50] |
List import jobs. |
export:list [--status=] [--limit=50] |
List export jobs. |
import-export:status <job-uuid> |
Show job status and batches. |
import-export:retry <job-uuid> |
Retry failed batches (retryable adapters only). |
import-export:cancel <job-uuid> |
Cancel a job. |
import-export:cleanup [--days=30] |
Delete tmp-role files for terminal jobs older than the cutoff. |
Service API
Use ImportExportService directly when another service owns the workflow:
use Glueful\Extensions\ImportExport\Services\ImportExportService; use Glueful\Extensions\ImportExport\Support\ImportOptions; use Glueful\Extensions\ImportExport\Support\ImportSource; $job = $imports->createImport( 'products', new ImportSource('uploads', 'imports/products.csv', 'text/csv'), new ImportOptions(mode: 'dry_run', batchSize: 500, actorUuid: $userUuid) );
Exports use createExport() with ExportOptions.
Permissions
The HTTP API is guarded by the extension-owned import_export_permission route
middleware, which resolves the framework PermissionManager and calls can() with the
import_export resource. The guard fails closed: no authenticated user, no available
permission manager, or a denial all return HTTP 403.
Permission slugs (registered in the framework permission catalog):
import_export.viewimport_export.run_importimport_export.run_exportimport_export.cancelimport_export.retry
Events
All events extend the framework BaseEvent. Payload fields in parentheses.
| Event | Dispatched when |
|---|---|
ImportExportJobCreated (jobUuid, type, adapter) |
A job and its batches are queued. |
ImportExportJobStarted (jobUuid, type, adapter) |
The first batch claim moves the job to running. |
ImportExportBatchCompleted (jobUuid, batchUuid, type, adapter) |
A batch finishes with zero failed records. |
ImportExportBatchFailed (jobUuid, batchUuid, type, adapter, reason) |
A batch finishes with failed records, or an adapter exception fails a claimed batch. |
ImportExportJobCompleted (jobUuid, type, adapter) |
All batches finished with no failures. |
ImportExportJobFailed (jobUuid, type, adapter, reason) |
All batches finished and at least one failed. |
ImportExportJobCancelled (jobUuid, type, adapter) |
A job is cancelled via HTTP or CLI. |
Reports, Failed Records, and Retention
- Reports:
GET /jobs/{uuid}/reportreturns the latest stored report or builds one from job state: type, adapter, status, total/processed/failed records,error_overflow_count, and the stored error count. - Error caps: stored row errors are capped per severity (first N stored, currently
1000); past the cap the engine increments the job's
error_overflow_countinstead of inserting rows. - Failed-record export:
FailedRecordExporterwrites a job's stored row errors to a CSV or NDJSON file. It is a service-level capability -- there is no HTTP route or CLI command for it yet, and nothing populates the report row'sfailed_records_*columns automatically. - Retention:
RetentionCleaner(viaimport-export:cleanup) deletes files recorded with thetmprole for terminal (completed/failed/cancelled) jobs older than the cutoff, treating stored paths as local filesystem paths. Source and result files are never deleted, and job/batch/error/report rows are not pruned.
Configuration
Configuration is loaded from config/import_export.php and merged under the
import_export key.
Several keys are reserved: they are declared (and their defaults match today's hardcoded runtime values) but are not yet read by the runtime paths, so changing them currently has no effect.
| Key | Default | Status | Purpose |
|---|---|---|---|
enabled |
true |
Reserved | Extension-level enable flag (not currently consulted). |
routes_enabled |
true |
Wired | Set to false for service/CLI-only installs. |
queue |
import-export |
Wired | Queue name used for batch jobs. |
source_disk |
uploads |
Reserved | HTTP/CLI default the source disk to the literal uploads. |
result_disk |
uploads |
Reserved | Result file rows currently record the job row's disk (effectively local). |
tmp_disk / tmp_path |
local / import-export/tmp |
Reserved | Retention treats stored tmp paths as local filesystem paths. |
batch_size |
500 |
Reserved | Creation paths default to 500; override per job via batch_size / --batch-size. |
max_file_size |
52428800 |
Reserved | No engine-side size enforcement yet; validate in supports()/plan(). |
retention_days |
30 |
Reserved | import-export:cleanup --days defaults to 30 independent of config. |
error_cap_per_severity |
1000 |
Reserved | Runtime cap is currently fixed at 1000 per severity. |
stale_lock_minutes |
15 |
Reserved | Stale-lock reclaim window is currently fixed at 15 minutes. |
Security
Archive Safety (ZIP-Slip)
ZIP bundle extraction routes every entry name through PathGuard, which rejects:
- absolute paths and backslash/UNC-style paths,
- parent-directory traversal that escapes the extraction root,
- empty or dot-only paths,
- Windows drive-letter paths.
After normalization, a realpath containment check verifies the resolved target
directory is still under the extraction root. Hostile archives are covered by tests.
Permission Gating
Every HTTP route runs the fail-closed permission middleware described above; there is no unauthenticated or ungated route in this extension.
Adapter Trust Boundary
Adapters run inside the application process. They should validate source structure, enforce domain permissions before writing records, and avoid shelling out to user-controlled paths. The engine records errors and progress, but it does not validate domain-specific fields, content models, prices, users, or publishing rules.
Error Data
Stored row errors may contain excerpts or identifiers from imported data. Adapters should avoid putting secrets, access tokens, or full sensitive records into error contexts.
Requirements
- PHP 8.3 or higher
- Glueful 1.55.0 or higher
- A configured queue worker for asynchronous batch processing
License
MIT -- licensed consistently with the Glueful framework.
Support
For issues, feature requests, or questions, please create an issue in the repository.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 2
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-06-11