2023.1:Correct (Activity)

From Grooper Wiki
Revision as of 07:36, 3 July 2024 by Rpatton (talk | contribs) (// via Wikitext Extension for VSCode)

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.

Glossary

About

When running OCR on a document, you don't always get perfect results. The OCR Engine makes mistakes. You can use Fuzzy Matching to still extract the information that you want, but the original OCRed text data attached to the document file will still reflect that bad OCR. The Correct Batch Process Step will change the text data attached to the document so when you export your files, the digital text will be more accurate.

You can also remove sections of text from the digital text of the document using the Correct Batch Process Step. However, the Correct Batch Process Step will not remove the information from the document image or PDF. To do that, you would need to add a Redact Batch Process step.

How To

Fuzzy Matching

  1. "Employer Signature" is an entry in our List Match as pictured below.
  2. However, we are not collecting the "Employer Signature" label from the document.
  3. To find out why, click on the Renditions icon located in the top right corner of the Document Viewer.
  4. Click on Text from the drop down.


  1. We are not getting the result because of bad OCR. We can see in the text that "Employer Signature" was recognized as "Employer Signat re".


  1. If we turn on Fuzzy Matching, we can then get the result that we want. However, the text data remains the same. Only the extraction is corrected.


Adding the Correct Batch Process Step