Layout Data (Concept)

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Layout Data refers to visual information such as line locations, OMR checkbox locations and states, barcode values, and detected shapes captured by certain image processing commands. This data is stored in a "Grooper.LayoutData.json" file and attached to a folder Batch Folder or contract Batch Page object. This data can later be recalled by Grooper extractors and other functionality that rely on the presence of that data to function.

About

The following IP Commands create layout data:

Line Detection and Line Removal
Box Detection and Box Removal
Barcode Detection and Barcode Removal
Shape Detection and Shape Removal

Any execution of an IP Profile with one or more of these commands will collect and store layout data. IP Profiles can be executed in one of two ways:

The Image Processing activity.
The Recognize activity.

The Image Processing activity will permanently alter a document's image. An IP Profile executed during Recognize will temporarily alter the document's image to clean it up before OCR, then revert back to the original image.

However, in either case, the layout data is collected and stored as a "LayoutData.json" file.

⚠

In most cases, these activities are applied at the page level, storing the "LayoutData.json file on the page object. However, in cases where Recognize is ran on the folder level, that file will be stored on the folder object.

If this information is used during data extraction and for whatever reason layout data was extracted at both the folder level and the page level, Grooper will always prioritize the layout data on the folder level. If the layout data on the folder is different from the layout data on the pages, Grooepr will ignore the page's layout data and go with what's on the folder.

Use Cases

WIP

This section needs expansion.

Extractor Logic
- Tab Marking
- Post-Processing
  - OCR Reader (Result Post Processor)
  - OMR Reader (Result Post Processor)
- Key-Value Pair (Collation Provider)
- Snap-to-Lines
Extraction Techniques
- Table Extraction
  - Infer Grid (Table Extract Method)
  - Header-Value (Table Extract Method)
- Barcode Reading
  - Find Barcode

Examples in LayoutData.json

Once captured by the Image Processing or Recognize activities, Layout Data is saved as a .json file named "LayoutData.json" in the Grooper Repository as a companion file to processed Batch Folder or Batch Page. This file can be viewed by navigating to the Batch Folder or Batch Page object in the Node Tree, navigating to the "Advanced" tab, and examining the Files for that object. Currently there are four pieces of information that can be stored in this file, though this is likely to grow over time: Lines, OMR Checkboxes, Barcodes and Shapes.

Lines

Lines listed in the LayoutData.json file are broken out into Horizontal Lines and Vertical lines. For each line, X/Y coordinates will be listed for the start and stop points, noted as "ptA" and "ptB". The X/Y coordinates are measured in inches, in relation to the top left corner of the image, which is known as point 0,0. Below is an example of a Horizontal Line in the LayoutData.json file:

 "HorizontalLines": [
   {
     {
       "X1": 0.1867,
       "X2": 8.1533,
       "Y1": 0.2233,
       "Y2": 0.2233
     }
   }
 ]

OMR Checkboxes

Each OMR checkbox identified will be stored in the LayoutData.json file with X/Y coordinates for the bounding rectangle and an IsChecked boolean flag. The X/Y coordinates denote the location of the top-left and bottom-right corners of the corresponding OMR checkbox. The IsChecked flag indicates whether the corresponding OMR checkbox is filled in (checked). Below is an example of an OMR checkbox in the LayoutData.json file:

 "OmrBoxes": [
   {
     "Bounds": {
       "X1": 1.6467,
       "X2": 1.7567,
       "Y1": 1.3533,
       "Y2": 1.4633
     },
     "IsChecked": false
   }
 ]

Barcodes

Each Barcode identified will be stored in the LayoutData.json file with information regarding the barcode type (symbology), X/Y coordinates for the bounding rectangle, check some validation flag, confidence score, orientation, and value. The BarcodeType is an integer which represents a corresponding barcode symbology. As an example, a BarcodeType of 8 indicates that this Barcode uses the Code 128 symbology. The ChecksumIsValid flag is set to True if the barcode contains a valid checksum. The Orientation indicates the read direction of the barcode. As an example, an Orientation of 1 indicates the barcode was read in East orientation. The Value is the read value from the barcode. Below is an example of a Barcode in the LayoutData.json file:

 "Barcodes": [
   {
     "BarcodeType": 8,
     "Bounds": {
       "X1": 1.8467,
       "X2": 4.5167,
       "Y1": 1.0833,
       "Y2": 1.9333
     },
     "ChecksumIsValid": false,
     "Confidence": 1,
     "Orientation": 1,
     "Value": "Dummy Value Read from Barcode"
   }
 ]

Shapes

WIP

This section needs expansion.