2023:Scope (Property): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 9: Line 9:
* ''Page'' - One task will be created for each page.
* ''Page'' - One task will be created for each page.


== Glossary ==
<u><big>'''Activity Processing'''</big></u>: {{#lst:Glossary|Activity Processing}}
<u><big>'''Activity Processing'''</big></u>: {{#lst:Glossary|Activity Processing}}
<u><big>'''Activity'''</big></u>: {{#lst:Glossary|Activity}}
<u><big>'''Batch Folder'''</big></u>: {{#lst:Glossary|Batch Folder}}
<u><big>'''Batch Page'''</big></u>: {{#lst:Glossary|Batch Page}}
<u><big>'''Batch Process Step'''</big></u>: {{#lst:Glossary|Batch Process Step}}
<u><big>'''Batch Process'''</big></u>: {{#lst:Glossary|Batch Process}}
<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
<u><big>'''Classify'''</big></u>: {{#lst:Glossary|Classify}}
<u><big>'''Clip Frames'''</big></u>: {{#lst:Glossary|Clip Frames}}
<u><big>'''Correct'''</big></u>: {{#lst:Glossary|Correct}}
<u><big>'''Detect Frames'''</big></u>: {{#lst:Glossary|Detect Frames}}
<u><big>'''Document Type'''</big></u>: {{#lst:Glossary|Document Type}}
<u><big>'''Execute'''</big></u>: {{#lst:Glossary|Execute}}
<u><big>'''Export Behavior'''</big></u>: {{#lst:Glossary|Export Behavior}}
<u><big>'''Export'''</big></u>: {{#lst:Glossary|Export}}
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
<u><big>'''Image Processing'''</big></u>: {{#lst:Glossary|Image Processing}}
<u><big>'''Image Processing'''</big></u>: {{#lst:Glossary|Image Processing}}
<u><big>'''Initialize Card'''</big></u>: {{#lst:Glossary|Initialize Card}}
<u><big>'''Lexicon'''</big></u>: {{#lst:Glossary|Lexicon}}
<u><big>'''Recognize'''</big></u>: {{#lst:Glossary|Recognize}}
<u><big>'''Render'''</big></u>: {{#lst:Glossary|Render}}
<u><big>'''Review'''</big></u>: {{#lst:Glossary|Review}}
<u><big>'''Scope'''</big></u>: {{#lst:Glossary|Scope}}
<u><big>'''Send Mail'''</big></u>: {{#lst:Glossary|Send Mail}}
<u><big>'''Separate'''</big></u>: {{#lst:Glossary|Separate}}
<u><big>'''Separation'''</big></u>: {{#lst:Glossary|Separation}}
<u><big>'''Service'''</big></u>: {{#lst:Glossary|Service}}
<u><big>'''Split Pages'''</big></u>: {{#lst:Glossary|Split Pages}}
<u><big>'''Split'''</big></u>: {{#lst:Glossary|Split}}
<u><big>'''Undo Separation'''</big></u>: {{#lst:Glossary|Undo Separation}}
<u><big>'''XML Transform'''</big></u>: {{#lst:Glossary|XML Transform}}


== About ==
== About ==

Revision as of 12:59, 10 May 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520232.72

The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Activities can be scoped to different levels in a Batch:

  • Batch - One task will be created for the entire batch.
  • Folder - One task will be created for each folder at a specific level within the batch.
  • Page - One task will be created for each page.

Glossary

Activity Processing:

Activity Processing:

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page: contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Classify: unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

  • Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
  • Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Clip Frames: view_module Clip Frames is a specialized Activity for processing microfiche in Grooper. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Correct: abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.

Detect Frames: view_module Detect Frames is a specialized Activity for processing microfiche in Grooper. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export Behavior: An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Image Processing: wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.

Image Processing: wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.

Initialize Card: view_module Initialize Card is a specialized Activity for processing microfiche in Grooper. It prepares and configures microfiche card images for further processing.

Lexicon: dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Render: print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Scope: The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Send Mail: forward_to_inbox Send Mail is an Activity automates email notifications from Grooper based on events and conditions set by a settings Batch Process. Optionally, documents in the inventory_2 Batch may be attached to the generated email.

Separate: insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

Service: Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

Split Pages: Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.

Split: Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.

Undo Separation: Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

XML Transform: code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

About

An important thing to understand about Scope is that for nearly every activity you are telling Grooper what you want to apply the activity to. For example, Recognize, do you want to affect Pages or Folders (keep in mind, however, that in nearly every scenario it is considered best practice to scope Recognize to Page). Or, Extract, do you want to extract data from folders at level 1, or level 2, etc.

Batch Scope

Setting the Scope property to Batch is occasionally used. Scoping to Batch is done when the entire contents of a Batch are to be affected as a whole. Separation and Review are activites that commonly use this configuration.

Folder Scope

The most common setting for the Scope property is Folder. Using this setting exposes the Folder Level property, which is set to an integer like 1, 2, or 3 etc. Understanding Batch hierarchy as it relates to the Folder Level property is important.

Setting the Folder Level property to something other than the default of 1 is most common with the Classify, Extract, and Export activities.

Consider a Batch with a Batch Folder at "Folder Level 1" that has two child Batch Folders at "Folder Level 2". The document at "Folder Level 1" is a packet made up by its two "sub" documents. For this example consider the document at "Folder Level 1" is a "Mortgage Packet" document consisting of a "Closing Disclosure" document and a "Universal Residential Loan Application" document.


Consider this model:


For this example a Batch Process would have two Batch Process Steps configured with the Extract activity. The Scope property on each will be set to Folder. However, one will have the Folder Level property set to 1, and the other would be set to 2.

The extract step set to "folder level 1" will target the "Mortgage Packet" document and will collect the following information:


The extract step set to "folder level 2" will target both the "Closing Disclosure" and the "U.R.L.A" and will collect the following information:


For this example the Batch Process will also have two Batch Process Steps configured with the Export activity. The Scope property on each will be set to Folder. However, one will have the Folder Level property set to 1, and the other would be set to 2.

The export step set to "folder level 1" will target the "Mortgage Packet" document and will leverage an Export Behavior on the Content Model that will export the whole "Mortgage Packet" and its contents to a content management system and apply an index field in that system using the "Borrower Name" field.

The export step set to "folder level 2" will target both the "Closing Disclosure" and the "U.R.L.A." and will leverage an Export Behavior set on each Document Type to send their respective data to tables in a database.

Page Scope

The Scope of Page is unique in that all Batch Page objects are considered a single scope. You may have a Batch with numerous Folder Levels and each different level may have child Batch Pages. However, if an activity's Scope is Page, it doesn't matter at what level in the Batch foldering hierarchy a Batch Page may exist, they will all be targeted.

Scoping Separate

Scoping for Separate and Review is a bit unique as well. For these activities Scope is not what you are affecting, but to where you are applying the activity.

You might think "I want to separate the pages of this batch into individual folders", and assume the Scope would be Page. This would be an incorrect assumption. With Separate you don't scope it to pages, you Scope it to either Batch or Folder. The separation may be affecting the pages, but the activity itself is pointed at the container of the pages, not the pages themselves.

Setting the Scope to Batch is typical when pages have been physically scanned and they exist at the root of a Batch. This would separate the Batch Page objects of the Batch into Batch Folders.

Setting the Scope to Folder and the Folder Level to 1 (assuming there are documents at "Folder Level 1", and the Split Pages activity has been performed) is typical of digital documents that have been imported. This would separate the Batch Page objects of the "Folder Level 1" Batch Folders into Batch Folders that would exist at "Folder Level 2".

Scoping Review

Review is interesting because you want to consider how the work is being done. Again, the Scope in this case is not pointed at what you are reviewing, but rather the contents of where are you reviewing. Let's say you have a Batch with 5 folders at level 1.

Assume the following:

  • Batch Process Step
    • Activity: Review
    • Scope: Folder
    • Folder Level: 1

This will create 5 Review Jobs, one for each document at Folder Level 1. Each Job will have a single Task.

Conversely, assume the following:

  • Batch Process Step
    • Activity: Review
    • Scope: Batch

This will create one Review Job with 5 Tasks in that one job to confirm.

This of course is a consideration of end users interacting with Review. Are five uers each individually assigned their own Review Job with a signle Task, or is one user completing a single Job with five Tasks?

Scoping "Data View"

The "Data View" of the Review activity is unique. It has a property called Processing Level. This "level" is relative to the Scope property set on the Batch Process Step.

For example, assume a Batch has a document at "Folder Level 1". That document consists of two "sub-documents" that would exist at "Folder Level 2".

Assume the following:

  • Batch Process Step
    • Activity: Review
    • Scope: Folder
    • Folder Level: 1
  • "Data View" of the Review activity
    • Processing Level: Level1

This will create 2 Review Jobs, one for each document at "Folder Level 2". Each Job would have a single Task.

Conversely, assume the following:

  • Batch Process Step
    • Activity: Review
    • Scope: Batch
  • "Data View" of the Review activity
    • Processing Level: Level2

This will create the same amount of Jobs and Tasks as the above example.

Finally, assume the following:

  • Batch Process Step
    • Activity: Review
    • Scope: Batch
  • "Data View" of the Review activity
    • Processing Level: Level1

This will create 1 Job for the single document at "Folder Level 1". This single Job will consist of 2 Tasks, one for each document at "Folder Level 2".

Scoping Recognize

Another consideration with Scope is processor efficiency. As mentioned earlier the best practice for the Recognize activity is to set its Scope to Page.

Consider a Batch with just one document at Folder Level 1. Consider also that this document has 1,000 pages. If the Scope were set to Folder and the Folder Level to 1 a Job would get created with one Task. As a result only a single CPU thread would pick up that single Task and take a very long time recognizing the text on that document's 1,000 pages.

However, if (following best practice) you set the Recognize Scope to Page, a Job will get created with 1,000 Tasks (one Task per page object). Therefore, depending on how you've structured your Activity Processing Services, you could have a wide array of CPU threads tackling each Task independantly. This would greatly decress the time required to recognize the text on the document.

Example: Separate > Undo Separation

One example of how scope is used in Grooper is seen below. In this example the Separate activity using the Undo Separation provider was run on a Batch containing multiple folder levels. The activity was run at three scope levels:

  1. Scope: Batch
  2. Scope: Folder
    • Folder Level: 1
  3. Scope: Folder
    • Folder Level" 2

The original Batch with three Batch Folder levels Undo Separation ran at the Batch scope.
  • All folders are removed.
Undo Separation ran at Folder > Level 1 scope.
  • All folders below the first level are removed.
Undo Separation ran at Folder > Level 2 scope.
  • All folders below the second level are removed.

Activities by Scope Options

Listed here will be every activity in Grooper organized by what options are available for scoping.

Batch, Folder, or Page

Batch and Folder

Folder and Page

Batch Only

Folder Only

Page Only