2021:Export (Activity)
The Export activity exports processed document content to an external storage platform.
Export is an Unattended Activity, typically added as of of the last steps (if not the last step) of a Batch Process. It allows Grooper users to deliver processed Batch content to an external system. Whether exporting Batch Folders as PDF files to a Windows folder, exporting extracted Data Model fields to a SQL database, exporting to a content management system, or some combination of multiple exports to multiple systems, the Export activity handles how document Batch Folders in a Batch ultimately leave Grooper after they have been classified and had their data extracted.
How documents are exported (what gets exported, where they go, and what format the exported content takes) is all controlled by Export Behaviors. This is a set of properties configured to control how Batch Folder content is exported based on its Document Type classification. Export Behaviors can be configured locally, configured as part of the Export activity's property configuration, or can be configured for a particular Content Type, by configuring the Behaviors property of a Content Model and/or its descendant Content Categories or Document Types.
About
So you've ingested some documents into a Batch. You've obtained their full text data with the Recognize activity, either through OCR or extracting their native embedded text. You've classified these documents, assigning the Batch Folders a Document Type from a Content Model during the Classify activity. You've collected the data you want from these documents during the Extract activity. Now what?
You need to get these documents and that data out of Grooper!
Enter the Export activity. Grooper is designed to be a document processing platform. It is a powerful tool to model document sets and their data (according to a Content Model) and put unprocessed pages or files through a step by step list of processing instructions (according to a Batch Process) to ultimately organize them and collect information from them. However, Grooper is not designed to be a content management system or a storage platform. Once your documents are organized and Grooper has extracted the data you want from them, you generally want to put those files and data in an external endpoint, such as a file system, a database, a true content management system or some combination thereof.
The Export activity's job is to get document content out of Grooper, according to your specifications. Using one or more Export Behavior definitions, you can control how processed document content is exported, how its indexed in what storage location, what data goes where, what file format certain content should take, and more.
Just What Is "Document Content"?
We're going to talk a lot about "document content" throughout this article. Ultimately, the Export activity controls what content is exported and how it is exported. So, what do we mean by "document content"?
In terms of its content, you can break up a document processed by Grooper into (at least) three meaningful components:
- The document's image
- The document's full text
- The document's extracted data
Each of these different kinds of content is another layer that comprises a whole document. Grooper's job is to take source material (scanned pages or imported files), derive the content you desire (such as extracting Data Elements from a Data Model), and using the Export activity recombine this content into a derivable file or data set to one or more storage endpoints.
|
Image Content | |
|
The document's image is simply what the viewer physically sees when viewing the document. Whether scanned pages or a digital file, like a PDF, this content comprises the pixels on the screen you're looking at when reading a document. This content can be altered in a Batch Process by the Image Processing activity, which is a typical part of processing scanned documents to clean up the image before OCR. Upon Export, Grooper can build a new file from these images, or just export whatever image content was originally imported. | |
|
Full Text Content | |
|
A good deal of document processing automation requires machine readable text to parse words, phrases and other text data. Grooper obtains a document's full text data through the Recognize activity, OCRing images or extracting embedded digital text. These results can then be embedded into the exported file as another part of its content during Export. | |
|
Extracted Data Content | |
|
Last but not least, the Extract activity in a Batch Process will collect information from the document, according to its classified Document Type and Data Model. This may be simple indexing data, even just the Document Type assigned during the Classify activity. This may be every meaningful data point on the document, obtained from a Data Model with hundreds of extracted Data Elements. Regardless, this needs to be stored somewhere and somehow, such as in a SQL database, content management system, or as a separate data file, like an XML or CSV file. |
How you merge this content into new files, define what storage platform it goes to, and how extracted data can drive indexing considerations is all controlled by the Export activity's Export Behavior configuration.
Export Behaviors
The Export activity exports documents according to an Export Behavior. This is a set of export property configurations based on the Content Type (i.e. Document Type of a Content Model) assigned to a document Batch Folder during document classification. Once a Batch Folder is assigned a Document Type, you have something you can point to in order to control the flow of traffic out of Grooper. For documents "A", build a PDF file and put them in folder "A" in a file system. For documents "B", put them in folder "B" and export their data to a database while you're at it. For document "C", you might do something entirely different. Or, you might perform essentially the same export for all Document Types in a Content Model. Export Behavior configurations are how you tell Grooper what to do for one Document Type or another upon export.


