survos/folio-bundle
最新稳定版本:2.8.1
Composer 安装命令:
composer require survos/folio-bundle
包简介
Portable SQLite folios for normalized museum dataset rows.
README 文档
README
Folio stores normalized/enriched dataset JSONL as portable SQLite archive files. It is the database, archive, and browsing layer for data that has already been normalized by dataset/import tooling.
harvest and md produce normalized JSONL. folio:ingest turns that JSONL into a standalone folio SQLite file. Consumers such as zm can use the Symfony/DataContracts stack when present, while Python/R/SQLite users can query the archive directly.
Required: survos/field-bundle, survos/data-contracts.
Suggested for ingest/write workflows: survos/jsonl-bundle, survos/import-bundle.
See docs/configuration.md for the required multi-connection Doctrine setup.
See docs/archive-metadata.md for the standalone archive metadata contract.
See docs/presentation-layer.md for the proposal to use folios as narrative institutional presentation packages.
Archive Contract
A folio file stores canonical rows in item and self-describing metadata alongside them:
schema_tableandschema_propertydescribe observed DTO types and fields in this archive.schema_property.statsstores field profile output fromsurvos/jsonl-bundle's profiler.docsstores generated JSON/Markdown documentation for humans, report writers, and AI agents.- generated
dto_*SQLite views project JSON fields into query-friendly columns. term_setandtermstore standalone controlled vocabularies and facets.
The metadata snapshot describes actual observed data, not the entire DTO contract universe. DTO classes from survos/data-contracts annotate observed fields with labels/descriptions when available, but consumers do not need PHP code to understand an archived folio.
Search and Publication Notes
folio:ingestloads rows, snapshots observed schema/docs/views, and rebuilds the SQLite FTS5 tableitem_fts.- Existing folios can rebuild search with
bin/console folio:fts:rebuild <provider/dataset> --query="search terms". folio:archiverefreshes archive metadata before packaging.- FTS tables are derived data. Published archive files may drop
item_fts,VACUUM, compress, ship, then rebuild FTS on the consuming side. - SQLite views and docs are also derived from persisted metadata, but they are intentionally lightweight and useful for standalone consumers.
- Vector search is intentionally deferred. When added, start with a hybrid SQLite design: FTS5/BM25 for exact keyword strength, sqlite-vec for semantic retrieval, and Reciprocal Rank Fusion to merge ranks without normalizing incompatible score scales. Reference: https://ceaksan.com/en/hybrid-search-fts5-vector-rrf
Direct SQLite Examples
select * from schema_table where kind = 'dto'; select * from schema_property where table_id = ? order by position; select local_id, label, dto_type, dto_data, extras from item limit 20; select * from dto_document limit 20; select id, type, audience, body from docs order by position;
TODO
- Add fieldSet support to the api-grid spreadsheet view to avoid displaying every DTO field at once.
- Rebuild views/docs on restore, not only FTS, if the archive was packaged without them.
统计信息
- 总下载量: 73
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2026-05-15