承接 lecodeurdudimanche/document-data-extractor 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

lecodeurdudimanche/document-data-extractor

Composer 安装命令:

composer require lecodeurdudimanche/document-data-extractor

包简介

A simple library to extract data from documents with a known structure

README 文档

README

A simple PHP library to automate data extraction from documents with known formats.

Requirements

This library uses Tesseract to read text from documents and Imagick to manipulate the images.

It relies on GhostScript (gs) to convert pdf files to images.

Installation

Install required php libraries : php-imagick. On Ubuntu :

apt install php7-imagick

Then install the package via composer :

composer require lecodeurdudimanche/document-data-extractor

Usage

First, you'll need to define what data you want to extract and where it is on the image :

    $extractor = new Extractor();
    $regionsOfInterest = [
        // The name of the company is in the rectangle with the top left corner (700, 180) and a size of (1080, 160)
        new ROI('Name of the company')->setRect(700, 180, 1080, 160),
        new ROI('Total', 'integer')->setRect(1980, 1572, 58, 52);
    ];

Next you can add some options forwarded to tesseract in order to get more precise results :

    $tesseractConfiguration = [
        'psm' => 8, // Page segmentation method is set to 8 (single word)
        'tessdataDir' => '/usr/share/tessdata' // Other tesseract options ...
    ];
    $config = Configuration::fromArray(compact('regionsOfInterest', 'tesseractConfiguration'));
    $extractor->setConfig($config);

Then you set the document you want to extract data from :

    $extractor->loadImage('/path/to/image.png'); // or
    $extractor->loadPDF('/path/to/document.pdf'); // or
    $extractor->setImage($imageData); // could be an Imagick or GD image or raw image data

And finally you call the run() method to extract the data :

    $data = $extractor->run();
    /*
    * $data = [
    * ['label' => 'Name of the company', 'type' => 'text', 'data' => 'Company Limited'],
    * ['label' => 'Total', 'type' => 'integer', 'data' => '55']
    * ];
    */

You can save and load a Configuration object with the toFile and fromFile methods. The file format is pretty formatted JSON.

统计信息

  • 总下载量: 3
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2019-11-13

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固