Export Format

Export Formats define what file types are exported by an Export Behavior's Export Definition. They are used to generate output files from document content (including files attached to the Batch Folder, child pages, and extracted data).

Export Formats can be used to generate a variety of file types including:

PDFs and TIFFs from document images
JSON, XML, CSV and TXT metadata files from extracted Data Model data
ZIP archives of file attachments/images from descendant Batch Folder/Batch Pages

Export Formats summarized

There are currently ten (10) Export Formats. They can be divided into 3 categories:

Merge Formats
Metadata Formats
Other Formats

Merge Formats

These formats generate an output file by merging content from a Batch Folder's children (and sometimes content stored on the Batch Folder itself) into a single file. Merge Formats combine multiple Batch Pages or other children of a Batch Folder into a single file, such as a multipage PDF or TIF.

Merge Formats are used by both the Export activity and the Merge activity.
- When added to an Export Definition in an Export Behavior, Export will build the file and export it to the folder destination configured in the Export Definition.
- When configured for the Merge activity, Merge will build the file and attach it to the Batch Folder.
XML Format is an outlier in this category. It does not generate the file by merging content from the Batch Folder's children. Instead, it generates an XML file from extracted Data Model data stored on the Batch Folder. XML Format is in the Merge Format category simply because the Merge activity can use it.

PDF Format

This will output a PDF file from the Batch Folder content (its child Batch Pages and in some configurations other content). This includes capabilities to embed full text data obtained from the Recognize activity.

TIF Format

This will output a multipage TIF file using the Batch Folder's child Batch Page images content.

XML Format

This will output extracted Data Model values to an XML file and format it according to an XSD schema file.

XML Format is designed to be used with Data Models generated by the XML Schema Importer using an XSD schema file.
XML Format differs from the XML Metadata format. XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.

ZIP Format: This will output a ZIP file containing the file attachment for all descendent nodes.

Metadata Formats

These formats generate an output file containing metadata extracted from a Grooper document. The Metadata Formats build various text-based files from the Batch Folder's extracted Data Model and its fields.

When noted below, certain formats can only output single-instance Data Fields and their values (not Data Table values or values in multi-instance Data Sections).

Delimited Metadata

This will output extracted Data Model values to a value-delimited text file. You can choose the file extension and delimiter in its configuration. You can configure this to make comma separated value (CSV) files.

Only single instance Data Fields are output.

Simple Metadata

This will output extracted Data Model values to a text file. This file formats Data Fields and their values as simple "key-value pairs".

Only single instance Data Fields are output.

JSON Metadata

This will output extracted Data Model values to a JSON file. :The JSON layout can be "Simple" or "Full"

Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.

XML Metadata

This will output extracted Data Model values to an XML file.

The XML includes additional information collected for each "data instance", including a value's page location data.
The XML Metadata format differs from the "XML Format". XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.

Other Formats

There are 2 Export Formats that do not fit into the other categories:

Attached File
Text Format

Attached File

This will output a Batch Folder's main "attachment file" or an attached file by name.

For files that were imported from a digital source, the attachment file is the file attached to the Batch Folder when it was created on import.
This option can also output any file attached to a Batch Folder by referencing a filename. This is how Grooper exports files generated by activities such as XML Transform, Text Transform, Merge or custom scripted activities.
If the Batch Folder has no attachment, this option will generate an image version of the document from all child Batch Pages in the folder.

Text Format: This will output full text content only, generated from OCR data, as a text file.

How to add Export Formats

Export Formats can be configured for any of the Export Definitions that export files (all Export Definitions except Data Export).

From an Export Behavior's Export Definitions editor, select the Export Definition you wish to configure.
Find the "Export Formats" property.
- For some Export Definitions (like CMIS Export), you will need to configure some required properties before the Export Formats property appears.
Open the Export Formats editor (Press the "..." button)..
To add an Export Format, press the "Add" button.
Select the Export Format you wish to use from the dropdown list.
If necessary, you can add multiple Export Formats by pressing the Add button again.

Export Formats detailed

Below we will briefly describe each Export Format to give you a better idea of the files they create.

Merge Formats

PDF Format

The PDF Format will output a PDF file from the Batch Folder content. This will be either:

From an an imported PDF file attached to the Batch Folder on import.
Or more commonly, the Batch Folder's child Batch Pages. The PDF's pages are generated from each child Batch Page.

The PDF Format can also embed text data into image-based pages. This is how Grooper creates text searchable PDFs using OCR text obtained from the Recognize activity.

To do this, under PDF Format's "Build Options", turn "Searchable" to "True".

Another important property to note is the "Always Build" property.

This property will force the PDF file to be generated, even if there is a PDF file attached to the Batch Folder.
Enabling this property is important when using the Split Pages activity on imported PDF files. Turning "Always Build" to "True" will stitch together a new PDF file from the processed Batch Pages, rather than exporting the PDF file attached to the Batch Folder.
Enabling this property is important when using the PDF Data Mapping behavior. This will ensure PDF Data Mapping will always build a new PDF file and export it, rather than an imported PDF file that is attached to the Batch Folder.

Click here for an interactive walkthrough

Example Output

This will export a PDF file, according to the PDF Format's property grid settings.

You can see here, we have text-behind generated from the Searchable property. We can select this text with our cursor.

BE AWARE!! Text data can ONLY be embedded behind images (either image formats like JPEG or TIF or single-image PDF pages). Enabling the Searchable property will only embed searchable text behind image-based pages.
- Text data obtained from a native-text PDF pages CANNOT be embedded into a native-text PDF page.
- You would have to convert the native-text PDF page into an image first to embed the text data. This is atypical, but necessary when dealing with poorly formed or corrupt PDFs.

TIF Format

The TIF Format will output image content only as a TIF (Tagged Image Format) file.

TIF is a format used to store high quality raster graphics for graphic design or publishing.
Keep in mind this is an image only format. If you want text-behind embedded in your files, you must use the PDF Format.

Click here for an interactive walkthrough

Example Output

XML Format

"XML Format" will output extracted Data Model values to an XML file.

XML Format and XML Metadata both generate an XML file from a document's Data Model. The difference is in how the XML's schema is formatted.

XML Metadata uses a set schema developed by Grooper.
XML Format allows users to select an XSD schema file.

XML Format allows users to format the data Grooper collects to whatever schema they want without transforming the Grooper XML schema using XML Transform or XML transformations outside of Grooper.

Click here for an interactive walkthrough

ZIP Format

The ZIP Export Format enables you to export multiple documents as a ZIP file. A single ZIP file will be generated containing the file attachments for all descendent Batch Folders.

Click here for an interactive walkthrough

Metadata Formats

Simple Metadata

The Simple Metadata format will output extracted Data Field values to a text file.

This file formats Data Fields and their values as simple "key-value pairs".
The Data Field's name is the key. Its value extracted from the document is the value.
The keys-value pairs are separated by a delimiter, which is "=" buy default.
Ex: fieldName=fieldValue
Only single instance Data Fields are output.

Click here for an interactive walkthrough

Example Output

As you can see here, Data Fields are exported to a text file as a simple list of key-value pairs.

Data Field names are on the left.
The Delimiter character (an equals sign by default) is in the middle.
Extracted values are on the right.

Delimited Metadata

The Delimited Metadata format outputs extracted Data Field values to a character delimited text file.

This formats Data Field values as a delimiter-separated value array (i.e value1,value2,value3).
Use the "Text Extension" property to choose a file extension. TXT is the default.
Use the "Delimiter" property to define the character. This is a comma (,) by default.
Use the "Delimiter Escape" property to replace a delimiter in the Data Field's value with a different character (Ex: swap a comma for a semicolon).
Use the "Include Header" property to include a header row in the file. The header row is populated with the Data Field's names.
Only single instance Data Fields are output.

Click here for an interactive walkthrough

Example Output

Extracted Data Field values are exported as a comma-delimited list of values in a text file.
Note since we enabled Include Header, we have a header row of our Data Field names output as well.

JSON Metadata

The "JSON Metadata" format will output extracted Data Model values to a JSON file. The JSON layout can be "Simple" or "Full"

Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.

Click here for an interactive walkthrough

XML Metadata

The XML Metadata format will output extracted Data Model values to an XML file. The XML uses an XML schema developed by Grooper do detail information about extracted "data instances", including values, page location information, confidence scores, and more.

Document level information (including the Batch Folder's classified Document Type) are found in the <Document> tag.
Data Field values and information are found in the <Field> tags.
Extracted table values are found in <TableCell> tags as children of <TableRow> and <Table> parent tags.

Click here for an interactive walkthrough

Example Output

Document level information (including the Batch Folder's classified Document Type) are found in the <Document> tag.
Data Field values and information are found in the <Field> tags.
Extracted table values are found in <TableCell> tags as children of <TableRow>

FYI	XML data can be reformatted using XSLT style sheets using the XML Transform activity.

Other Formats

Attached File

Several files are attached to Batch Folders throughout its lifecycle in Grooper.

When a file is imported by an Import Provider, the file is attached to the Batch Folder created in Grooper. This is the Batch Folder's "attachment file".
When Extract runs, a "Grooper.DocumentData.json" file is attached to the Batch Folder.
When layout data is collected a "Grooper.LayoutData.json" file is attached to the Batch Folder.
When activities like XML Transform, Text Transform, or Merge generate files, they are attached to the Batch Folder.
- Be aware, Merge can be configured to replace a Batch Folder's attachment file or attach an additional file.

When configuring the Attached File format, there is only one configurable property: Filename

Leaving this property blank will export the Batch Folder's attachment file.
- If the Batch Folder has no attachment, Grooper will generate an image version of the document from all child Batch Pages in the folder.
Use the Filename property to reference a file attached to a Batch Folder by name. The referenced file will be exported instead of the attachment file.
- This is how Grooper exports custom generated files from activities such as XML Transform, Text Transform, Merge, or custom scripted activities.

Click here for an interactive walkthrough

More on the Attached File format

When using the "Attached File" format, be aware of two common scenarios involving a Batch Folders main "attachment file".

Common scenario 1: In Batch Processes where a Merge step runs before Export, this will export the file generated by Merge.

This is typically ideal.
In typical configurations, the Merge activity replaces the Batch Folder's main attachment file"(if present) with the PDF or TIF it generates. So, no further Export Format configuration is required to export a Grooper generated PDF or TIF document.

Common scenario 2: In Batch Processes where a Batch is created by an Import Job and a Merge step is not present, this will export whatever file was imported at the start of the Batch Process.

This is not always ideal.
When an Import Job imports a file into Grooper, a Batch Folder is created and the file is attached to it. This is the Batch Folder's main attachment file.
- If you simply want to export the same file you imported to export destination, no further Export Format configuration is required.
- However, if you want to export a new file generated by Grooper you will need to (1) delete the Attached File format and (2) add one of your choosing (most typically a PDF Format).

Text Format

The Text Format will output full text content only, generated from OCR data, as a text (TXT) file.

This is the same data you would see in a Batch Folder's "Text Rendition" in the Document Viewer.
Text data generated by the Recognize activity is used to create the file. The "Grooper.Characters.txt" file is used to build the file.
- Technically, a text file would be generated from either OCR data or native text data.

Click here for an interactive walkthrough

Example Output

Upon export, this will generate a text file from the Batch Folder's raw OCR text data, generated from the Recognize activity.