2023:Detect Signature (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Tag: Reverted
Line 11: Line 11:
* [[Media:2023 Wiki Detect-Signature Batch.zip]]
* [[Media:2023 Wiki Detect-Signature Batch.zip]]
|}
|}
== Glossary ==
<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
<u><big>'''Detect Signature'''</big></u>: {{#lst:Glossary|Detect Signature}}
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
<u><big>'''Extractor Type'''</big></u>: {{#lst:Glossary|Extractor Type}}
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
<u><big>'''Project'''</big></u>: {{#lst:Glossary|Project}}
<u><big>'''Read Zone'''</big></u>: {{#lst:Glossary|Read Zone}}
<u><big>'''Value Reader'''</big></u>: {{#lst:Glossary|Value Reader}}


== About ==
== About ==

Revision as of 13:16, 8 May 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023

Detect Signature is a Value Extractor that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

Glossary

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Detect Signature: Detect Signature is a Value Extractor that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Extractor Type:

OCR: OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Read Zone: Read Zone is a Value Extractor that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.

Value Reader: quick_reference_all Value Reader nodes define a single data extraction operation. Each Value Reader executes a single Value Extractor configuration. The Value Extractor determines the logic for returning data from a text-based document or page. (Example: Pattern Match is a Value Extractor that returns data using regular expressions).

  • Value Readers are can be used on their own or in conjunction with pin Data Types for more complex data extraction and collation.

About

Detect Signature is an Extractor Type specifically designed to detect if a signature is present or not. It's very similar to the Read Zone extractor in that you use one of the four Location options (Fixed Region, Relative Region, Shape Region or Text Region) to draw an extraction zone on a geographic region of the page.

However, rather than returning the OCR or native text data within the zone (as Read Zone does), an OMR-style extraction is performed. Think about a signature line. If you drew a box around where you expect someone to sign, nothing would be in the box if it was not signed. But regardless of the signature, some of the box would be filled in if it were.

The same basic concept applies for the Detect Signature extractor. Detect Signature determines this by a simple pixel count of the percentage of black pixels in the zone. Essentially, the extractor counts the number of black pixels in the extraction zone. If the number of black pixels falls above a certain percentage threshold, the extractor returns a value of "Signed" and if below it returns a value of "Not Signed".

BE AWARE: Detect Signature's setup is similar to the Read Zone extractor in that it uses the same Location properties the Read Zone extractor uses to draw the an extraction zone.

  • This article presumes you are familiar with the Read Zone extractor and its setup.
  • If you are not familiar with Read Zone, you may find it helpful to review the Read Zone article prior to following the tutorial in this article.

How To

In this example, a Value Reader is configured to return whether or not the "Senior Cow Representative Signature" is present on this Application for Cow Ownership form, using Detect Signature.