2021:Data Review: Difference between revisions

Revision as of 12:17, 26 March 2020

Data Review is an attended activity, allowing users to verify the results of an Extract activity.

This review module is used as manual validation of Grooper's automated data extraction. Users can review each document and their extracted fields according to how they are set up in a Content Model's Data Model. If the extracted data does not match the information on the page, the user is able to manually enter the correct information.

Version Differences

2.72 Big Document Support

The Data Review activity can now handle bigger documents significantly more quickly and efficiently.

It can handle data sections with multiple data fields with thousands of instances as seen here. This enables Grooper users to process documents with massive amounts of data.

2.9 Data Review Updates

Content Type FilterIndex Navigator SettingsTally StatsPage NavigationRendition Selector

In the Index Navigator Settings when configuring a Data Review activity, a Content Type filter can be applied to restrict included documents. When left blank (the default setting), all documents in scope will be displayed. If a list of content types is provided, only those document types will be displayed during review. This can be useful if only certain Document Types require manual review, so operators do not have to navigate through irrelevant documents. Alternatively, multiple Data Review activities could be configured, each set to display certain Document Types.

There is now a button in the top right corner of the index panel that will open the Index Navigator Settings.

Flag Invalid Items will flag any documents with invalid index data. This will add an extra layer of visibility to documents which should be double checked before export.
Display Parent Folder(s) allows you to view and edit the parent folder of the current document. In some situations, it may be useful for the parent fields to be visible and/or editable.
Auto-Load Next Invalid Document will allow you to skip over any documents with no invalid fields. If required fields with extracted values do not need to be reviewed by an operator, this will cut down on Review time significantly.
Content Type Filter allows you to restrict the documents to be reviewed based on content type. Similar to the filter that can be applied to the Data Review activity, this allows individual users to choose the document types they are responsible for reviewing.

Right-clicking in the Index Panel and selecting Tally Stats will bring up a panel listing the stats of your review session, including documents processed, percentage of valid documents, content types, and number of fields populated/extracted. This can be useful for tracking quality of extraction and amount of user interaction needed.

Right-clicking in the Index Panel will also bring up buttons to move to the next or previous page, with shortcuts Alt+Down or Alt+Up, respectively. This allows users to quickly navigate pages without having to use the buttons in the image panel.

</tab>

The top right corner of the Document Viewer now includes a drop-down list that allows you to view different renditions of the document. See Document Viewer for more information. This is extremely an extremely useful tool which allows you to check the quality of OCR, especially to troubleshoot bad extraction.

@@ Line 41: / Line 41: @@
 <br/>
 [[File:data_review_4.png|800px|center]]
+</tab>
+</tab>
+<tab name="Rendition Selector" style="margin:25px">
+The top right corner of the '''Document Viewer''' now includes a drop-down list that allows you to view different renditions of the document. See [[Document Viewer]] for more information. ''This is extremely an extremely useful tool which allows you to check the quality of OCR, especially to troubleshoot bad extraction.''
+<br/>
+[[File:data_review_05.png|800px|center]]
 </tab>
 </tabs>