承接 discoverygarden/islandora_hocr 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

discoverygarden/islandora_hocr

最新稳定版本:v1.4.0

Composer 安装命令:

composer require discoverygarden/islandora_hocr

包简介

无描述信息

README 文档

README

Introduction

Adds the hOCR derivative functionality.

Installation

Install as usual, see this for further information.

This module contains a migration facilitating the creation of a media use term for use in common Islandora configurations. Enabling the module will expose the islandora_hocr_media_uses migration to generate a media use term of the URI https://discoverygarden.ca/use#hocr.

# Flow might be something like: drush en islandora_hocr drush migrate:import islandora_hocr_media_uses

Configuration

Configuration is presented performed via environment variables.

Variable Default Description
ISLANDORA_HOCR_SNIPPETS 20 Number of snippets per document to include in the response.

Derivatives

An action must be created and configured to generate an hOCR derivative. The action must also be triggered by a context in order for the derivative to be made. Refer to the official Islandora docs for more information.

Solr

We expect to make use of the Solr OCR Highlighting Plugin. The particulars of its installation are ultimately up to the environment into which it is being installed.

We have a single environment variable to allow the path of the library on the Solr instance to be specified, such that we can add its path to the configset for Solr:

  • SOLR_HOCR_PLUGIN_PATH: A path resolvable by Solr to the directory containing the OCR Highlighting Plugin JAR.

There are a couple of config entities included:

  • the islandora_hocr field type to perform tokenization
  • the "Select w/ HOCR highlighting" /select_ocr request handler.

HOCR Indexing

To node entities, we have added the ability to index HOCR from related media, making use of the Solr OCR Highlighting Plugin

As an example, you might add the islandora_hocr_field:content property to be indexed in Solr via the Search API Solr config, as islandora_hocr_field, as a Fulltext ("islandora_hocr") field.

Something of an aside, but the islandora_hocr_field:uri is presently prototypical: The Solr OCR Highlighting plugin has another character filter which handles processing paths into the contents of the files; however, in the context of things communicating via the network, such access might not always be possible, particular should access control enter in to the equation... as such, we presently expect the full page-level OCR document to be pushed for each page.

Usage

Assuming indexing is configured as above, with a islandora_hocr_field, then you might programmatically perform a Search API query with something like:

$index = \Drupal\search_api\Entity\Index::load('default_solr_index'); $query = $index->query(); // The search term(s). $query->keys('bravo'); // Additional conditions, as desired. $query->addCondition('type', 'islandora_object'); // Activate our highlighting behaviour. $query->setOption('islandora_hocr_properties', [ 'islandora_hocr_field' => [], ]); // Perform the query. $results = $query->execute(); // Get the additionally-populated property info, so we can identify what fields from the highlighted results correspond to which property. $info = $results->getQuery()->getOption('islandora_hocr_properties'); // This should be an associative array mapping language codes to Solr fields, // which can then be found in the $highlights below. $language_fields = $info['islandora_hocr_field']['language_fields']; // When processing the results, the foreach ($results as $result) { // Highlighting info can be acquired from the items. The format here is the // same as the format from https://dbmdz.github.io/solr-ocrhighlighting/0.8.3/query/#response-format // for the given item/document. $highlights = $result->getExtraData('islandora_hocr_highlights'); }

Troubleshooting/Issues

Having problems or solved one? contact discoverygarden.

Known issues

  • Solr Cloud Package (in)compatibility: The path to the library could be omitted; however, the conditional inclusion of prefixes in the config entities is problematic.

Maintainers/Sponsors

Current maintainers:

Sponsor:

License

GPLv3

统计信息

  • 总下载量: 12.72k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 3
  • 依赖项目数: 2
  • 推荐数: 1

GitHub 信息

  • Stars: 0
  • Watchers: 6
  • Forks: 4
  • 开发语言: PHP

其他信息

  • 授权协议: GPL-3.0-or-later
  • 更新时间: 2026-01-04

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固