Main Page: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 33: Line 33:
|-style="background-color:#d8f3f1" valign="top"
|-style="background-color:#d8f3f1" valign="top"
|
|
[[file:simpletable.png|thumb|300px|Data in an Excel spreadsheet is an example of tabular data.]]
<blockquote>
<blockquote>
<span style="font-size:14pt">'''[[OCR]]'''</span>
<span style="font-size:14pt">'''[[OCR]]'''</span>
Line 39: Line 38:
OCR stands for Optical Character Recognition. It allows text from paper documents to be digitized to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text. This conversion allows Grooper to search text characters from the image, providing the capability to separate images into documents, classify them and extract data from them.
OCR stands for Optical Character Recognition. It allows text from paper documents to be digitized to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text. This conversion allows Grooper to search text characters from the image, providing the capability to separate images into documents, classify them and extract data from them.


In Grooper, tabular data can be extracted using the [[Row Match (Table Extract Method)|Row Match]], [[Header-Value (Table Extract Method)|Header-Value]], or [[Infer Grid (Table Extract Method)|Infer Grid]] table extraction methods.
The quick explanation of OCR is it analyzes pixels on an image and translates those pixels into text. Most importantly, it translates pixels into machine readable text. Grooper can be described as a document modeling platform. You use the platform to model how pages are separated out into documents, how one document gets put into one category or another, and how extractable data is structured on the document. Once you have this model of what a document is, how it fits into a larger document set, and where the data is on it, you can use it to programmatically process any document that fits the model.
 
In order to do any of that, you have to be able to read the text on the page. How do you know an invoice is an invoice? A simple way could be locating the word "invoice" (or other text associated with the invoice). You, as a human, do this by looking at the ink on a page (or pixels for a digital document) and reading the word "invoice". Grooper does this by using a Data Extractor (and regular expression) to read the machine readable text for the page. OCR is how each page gets that machine readable text in order to model the document set and process it.
|Did you know we have a wiki?
|Did you know we have a wiki?



Revision as of 09:17, 26 February 2020

Getting Started

Some kind of general intro paragraph about what Grooper is/does.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Introduction to Grooper
Install and Setup
Third article?


Featured Article Did you know?

OCR

OCR stands for Optical Character Recognition. It allows text from paper documents to be digitized to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text. This conversion allows Grooper to search text characters from the image, providing the capability to separate images into documents, classify them and extract data from them.

The quick explanation of OCR is it analyzes pixels on an image and translates those pixels into text. Most importantly, it translates pixels into machine readable text. Grooper can be described as a document modeling platform. You use the platform to model how pages are separated out into documents, how one document gets put into one category or another, and how extractable data is structured on the document. Once you have this model of what a document is, how it fits into a larger document set, and where the data is on it, you can use it to programmatically process any document that fits the model.

In order to do any of that, you have to be able to read the text on the page. How do you know an invoice is an invoice? A simple way could be locating the word "invoice" (or other text associated with the invoice). You, as a human, do this by looking at the ink on a page (or pixels for a digital document) and reading the word "invoice". Grooper does this by using a Data Extractor (and regular expression) to read the machine readable text for the page. OCR is how each page gets that machine readable text in order to model the document set and process it.

Did you know we have a wiki?

You're using it!


New in 2.8 Featured Use Case

New Microfiche Processing capabilities including

Two additional batch activities

  • Recognize - Combining the old OCR and PDF Extract activities.
  • Generate PDF - Generating PDF content from processed documents, including native-PDF element creation (such as signature widgets).

Two additional IP commands

New extraction methods available to data fields

Simpler and expanded Database Lookup capabilities.

Expression based Field Mapping between data elements and their locations in external storage platforms, allowing for easier data formatting and exporting of batch processing metadata.

Use case. Use case. Here a use case. There a use case. Everywhere a use case.


Uuuuuuuuuuuuuuuuuuuuuuuuuuuse case.


Other Resources

Getting started (MediaWiki)

MediaWiki has been installed.

Consult the User's Guide for information on using the wiki software.