Data Review is an attended activity, allowing users to verify the results of an Extract activity.
The Data Review module allows users to manually validate Grooper's automated data extraction. Users can review each document and their extracted fields according to how they are set up in a Content Model's Data Model. If the extracted data does not match the information on the page, the user is able to manually enter the correct information.
2.9 Data Review Updates
Content Type Filter
In the Index Navigator Settings when configuring a Data Review activity, a Content Type filter can be applied to restrict included documents. When left blank (the default setting), all documents in scope will be displayed. If a list of content types is provided, only those document types will be displayed during review. This can be useful if only certain Document Types require manual review, so operators do not have to navigate through irrelevant documents. Alternatively, multiple Data Review activities could be configured, each set to display certain Document Types.
There is now a button in the top right corner of the index panel that will open the Index Navigator Settings.
- Flag Invalid Items will flag any documents with invalid index data. This will add an extra layer of visibility to documents which should be double checked before export.
- Display Parent Folder(s) allows you to view and edit the parent folder of the current document. In some situations, it may be useful for the parent fields to be visible and/or editable.
- Auto-Load Next Invalid Document will allow you to skip over any documents with no invalid fields. If required fields with extracted values do not need to be reviewed by an operator, this will cut down on Review time significantly.
- Content Type Filter allows you to restrict the documents to be reviewed based on content type. Similar to the filter that can be applied to the Data Review activity, this allows individual users to choose the document types they are responsible for reviewing.
Right-clicking in the Index Panel and selecting Tally Stats will bring up a panel listing the stats of your review session, including documents processed, percentage of valid documents, content types, and number of fields populated/extracted. This can be useful for tracking quality of extraction and amount of user interaction needed.
Right-clicking in the Index Panel will also bring up buttons to move to the next or previous page, with shortcuts Alt+Down or Alt+Up, respectively. This allows users to quickly navigate pages without having to use the buttons in the image panel.
Index Panel settings now include Data Annotation settings. When enabled, the value of the focused field will be displayed just below where the value was found in the Document Viewer. This will reduce eye strain on reviewers, as they won’t have to look back and forth between values in the Document Viewer and indexed Data Fields in the Index Panel.
Index Panel settings now includes an Auto Focus option, which when enabled will automatically move the focus to the first invalid field of each document. This can be useful if you require every document to be seen by a human, but only invalid fields need to be reviewed.
While not technically a Data Review property, its functionality does impact Data Review. Data Fields now include an optional Auto Complete property. When enabled, the most recently entered values in a field will be remembered on subsequent documents. Optionally, an extractor can be set and/or a Lexicon can be added to add values to the auto complete list. Users can also be allowed to manually add values to the Lexicon or have all entered values automatically added. This is very useful in instances where the same value will be entered often, but cannot be extracted for whatever reason. Users can begin typing and quickly choose the correct value from the list.