Data Review

From Grooper Wiki
Jump to navigation Jump to search
Graphic depicting the Grooper Data Review.

Data Review is an Attended Activity, allowing users to verify the results of an Extract activity.


The Data Review module is used as manual validation of Grooper's automated data extraction. Users can review each document and their extracted fields according to how they are set up in a Content Model's Data Model. If the extracted data does not match the information on the page, the user is able to manually enter the correct information.

How To

Configuring the Data Review activity is very straight forward. You could basically stop at setting the Batch Process Step's Activity Type property to Data Review, however it's probably best to familiarize yourself with a few of the properties so as to be able to tailor the experience to your taste.

  1. The first and most essential property to set is the Activity Type property, which should be set to Data Review
  2. The Allow Completion with Invalid Documents property set to True can allow the user to Complete the attended activity without having to validate and correct all invalid index fields.
  3. The User Activity Timeout property allows you to specify the amount of time, in minutes, that a user can be inactive before an automatic logout occurs. If a user has been inactive for this number of minutes, the current Processing Task will automatically close. This keeps report statistics from becoming inflated because of users leaving the application open and unattended. Setting this property to 0 will disable this feature.
Data review 08.png
  1. Clicking the ellipsis button on the Command Options property will bring up the Command Configuation - Data Review window.
  2. In this window you can choose what object commands are available to the person attending to the Data Review step. This can be quite useful if you want to tailor a specific experience for your user. When selecting a command you can also see and/or set custom Shortcut Keys.
    • Notice in the screenshot that the Find Next Invalid command is highlighted, and its hotkey is Ctrl+I. This, and the sister command Next Invalid Field with a hotkey of Ctrl+N are two very useful commands for quickly getting through Data Review.
Data review 09.png
  1. Clicking the ellipsis button on the Index Navigator Settings property will bring up the Index Navigator Settings window.
  2. You may choose to use the Flag Invalid Items property if you believe you may leverage the flags in some kind of programmitic or manual descision.
  3. The Processing Level property is an important one if you have a more robust model with multiple levels of hierarchy in its foldering.
  4. The Auto-Load Next Invalid Document property is a useful one for allowing the user to quickly get through review by having them focus more on problematic index fields.
  5. See the Content Type Filter article for more information on this property. Essentially, you can choose to configure the step to only look at speific Content Types.
  6. Finally, using the Flag Messages property, you can either make local entries, or point to a Lexicon of tailored messages you want your user to select when manually flagging an item.
Data review 10.png

Version Differences

2.9 Data Review Updates

Content Type Filter

In the Index Navigator Settings when configuring a Data Review activity, a Content Type filter can be applied to restrict included documents. When left blank (the default setting), all documents in scope will be displayed. If a list of content types is provided, only those document types will be displayed during review. This can be useful if only certain Document Types require manual review, so operators do not have to navigate through irrelevant documents. Alternatively, multiple Data Review activities could be configured, each set to display certain Document Types.

Data review 01.png

Index Navigator Settings

There is now a button in the top right corner of the index panel that will open the Index Navigator Settings.

  • Flag Invalid Items will flag any documents with invalid index data. This will add an extra layer of visibility to documents which should be double checked before export.
  • Display Parent Folder(s) allows you to view and edit the parent folder of the current document. In some situations, it may be useful for the parent fields to be visible and/or editable.
  • Auto-Load Next Invalid Document will allow you to skip over any documents with no invalid fields. If required fields with extracted values do not need to be reviewed by an operator, this will cut down on Review time significantly.
  • Content Type Filter allows you to restrict the documents to be reviewed based on content type. Similar to the filter that can be applied to the Data Review activity, this allows individual users to choose the document types they are responsible for reviewing.

Data review 02.png

Tally Stats

Right-clicking in the Index Panel and selecting Tally Stats will bring up a panel listing the stats of your review session, including documents processed, percentage of valid documents, content types, and number of fields populated/extracted. This can be useful for tracking quality of extraction and amount of user interaction needed.

Data review 3.png

Page Navigation

Right-clicking in the Index Panel will also bring up buttons to move to the next or previous page, with shortcuts Alt+Down or Alt+Up, respectively. This allows users to quickly navigate pages without having to use the buttons in the image panel.

Data review 4.png

Data Annotations

Index Panel settings now include Data Annotation settings. When enabled, the value of the focused field will be displayed just below where the value was found in the Document Viewer. This will reduce eye strain on reviewers, as they won’t have to look back and forth between values in the Document Viewer and indexed Data Fields in the Index Panel.

Data review 05a.png

Auto Focus

Index Panel settings now includes an Auto Focus option, which when enabled will automatically move the focus to the first invalid field of each document. This can be useful if you require every document to be seen by a human, but only invalid fields need to be reviewed.

Data review 06.png

Auto Complete

While not technically a Data Review property, its functionality does impact Data Review. Data Fields now include an optional Auto Complete property. When enabled, the most recently entered values in a field will be remembered on subsequent documents. Optionally, an extractor can be set and/or a Lexicon can be added to add values to the auto complete list. Users can also be allowed to manually add values to the Lexicon or have all entered values automatically added. This is very useful in instances where the same value will be entered often, but cannot be extracted for whatever reason. Users can begin typing and quickly choose the correct value from the list.

Data review 07.png

2.72 Big Document Support

The Data Review activity can now handle bigger documents significantly more quickly and efficiently.

It can handle data sections with multiple data fields with thousands of instances as seen here. This enables Grooper users to process documents with massive amounts of data.