2023:OCR Profile (Node Type)
|
WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |

An OCR Profile defines the settings for performing OCR.
This includes:
- Setting which OCR Engine is used
- Determining whether a temporary IP Profile is used for image cleanup before the OCR engine runs
- Grooper's unique Synthesis settings
- Determining if and how multiple OCR results are pre-processed and re-processed
- If and how results are filtered, to toss out undesirable results.
- Any configurable settings available from the OCR Engine
| Previous Versions |
|---|
About
At first glance, an OCR profile may look like a wall of properties, and in some ways, it is. They are a way to save a collection of properties that determine how OCR results are obtained. Let's break these properties down, using a configured OCR Profile as an example.
The OCR Testing Tab
When you select them in the Node Tree, OCR Profiles also contain an "OCR Testing" tab to verify results of the profile. This will pull up a testing module, allowing us to select documents from a Test Batch, OCR individual pages, and view some extra diagnostic information that will help fine tune your property settings.
|
|
|
|
Once OCR is finished you will see OCR results appear in the "Layout View" tab in the bottom of the screen. |
|
|
|
Use Cases
OCR Profiles are required to obtain machine readable text from any image based content. Based on the image quality or source document quality, this may range from a relatively simply configured OCR Profile, perhaps just setting the OCR engine to be used, to a more complex one, taking advantage of temporary image processing, Grooper's Synthesis suite, or Result Filtering settings.
The only time you won't use an OCR Profile to obtain machine readable text is if you are only processing documents with full native text. These would be digital documents like a PDF created with encoded text already present that can be extracted via the Native Text Extraction functionality of the Recognize activity.
How To
Create an OCR Profile
Add a New OCR Profile to the Node Tree
Creating an OCR Profile is fairly straight forward. OCR Profiles may be created and stored in a Content Model's local resources folder or in the OCR Profiles folder in the Node Tree (which is found in the Global Resources folder). However, the most common place to create an OCR Profile is in the OCR Profiles folder.
Configure the OCR Profile
|
|
|
Configure the rest of the OCR Profile's properties according to your documents' needs. General information about these properties can be found in the About section of this article. |
|
Execute an OCR Profile
Now that you have made and configured an OCR Profile, how do you execute it? OCR results are obtained by the Recognize activity. This activity will perform OCR on documents based on the settings in an OCR Profile. You will run this activity in one of two ways in Grooper:
- Manual or "ad hoc" while testing and configuring within Grooper Design Studio.
- As a step in a Batch Process.
At any point you can get to a Batch Viewer in Grooper, you can execute various activities manually on a page, folder or entire batch. This manual execution of activities is typical when building and testing your solution design in Grooper Design Studio.













