2023.1:XML Schema Integration (Functionality)

From Grooper Wiki
Revision as of 10:28, 27 August 2024 by Randallkinard (talk | contribs)

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.

You may download and import the file(s) below into your own Grooper environment (version 2023.1). There is a Batch with the example document(s) discussed in this tutorial, as well as a Project configured according to its instructions.

Given the proprietary nature of SharePoint and Database connections, the connection objects and their configurations cannot be shared.
Please upload the Project to your Grooper environment before uploading the Batch. This will allow the documents within the Batch to maintain their classification status.

**The XML Schemas and Files.zip file is not a Grooper ZIP file, therefore it is not meant to be uploaded into Grooper. Instead, unzip the contents of this file normally.**

About

The XML format has been around since 1996 and, from a business perspective, can play a large role in your ability to organize data. This article will focus chiefly on XML Schemas and how Grooper can leverage them to define Data Model structures and control how data is exported out of Grooper.

How To

XML Schemas

Grooper's XML schema integration was borne from our EDI schema integration. For more information, check out the #FYI: EDI Integration - The Precursor to XML Schema Integration section of this article. EDI schemas are incredibly useful if you are working with that specific style of template. However, each EDI schema is industry specific. They are useless outside of their specific use case.

XML schemas, on the other hand, are ubiquitous across businesses and industries and can be leveraged in a similar way. They are way of outlining a hierarchical data structure in an easily transportable file format.

Follow the instructions in the screenshots below for more information on integrating XML schemas into Grooper's processing.

Adding .xsd Resource File

Just about any file can be added as a Resource File object in Grooper. For XML schemas the .xsd file standard will be used.

  • Drag files from a File Explorer window into a Project in Grooper to create a Resource File object.

Creating Data Elements from .xsd Resource File

We can now leverage the added Resource File object as a reference to create "Data Elements" from the schema.

  1. Right-click the "Simple Schema Ex" Data Model and select Import Schema.
  2. In the "Import Schema" dialog click the drop-down for the Source property and choose XML Schema Importer.
  3. Expand the sub-properties of the Source property.
  4. In the drop-down for the Schema File property choose the file "example xml schema.xsd".


  • This action has created all the necessary "Data Elements" in our model according to the XML schema.


  • A more complex schema will have different elements that will create different objects in a Data Model in Grooper. Grooper understands the hierarchy of the .xsd file and can make appropriate objects like Data Sections and their child Data Fields, etc.

Execute Activity to Validate XML Schemas

Configure the Batch Process Step

We can leverage an XML schema file to validate if a document matches a provided schema.

  1. Select the "Execute" Batch Process Step in the "Schema Processing Example" Batch Process
  2. Click the ellipsis button for the Commands property which will open the "Commands" dialog box.
  3. In the "Commands" dialog box, click the "Add" button to add a Command and set it to XML File - Validate Schema. You will see a command added to the command list.
  4. Expand the Command sub-properties and click the drop-down for the Schema File property.
  5. From the drop-down menu, select the .xsd file you would like to validate against, in this case we will choose "shiporder schema.xsd". Click the "OK" button to confirm changes and close the dialog box.


Test the Batch Process Step

  1. Click the "Activity Tester" tab.
  2. Select the documents from the "XML Docs - Batch".
  3. Click the test activity button.
  4. Note that "Document (2)" gets flagged, and upon inspection you will notice its XML format does not match the provided schema.

Execute Activity to Collect XML Data

Configure the Batch Process Step

The XML File - Load Data command can be used to collect data from XML files. As with all Grooper commands, an Execute step can be added to apply the command in a Batch Process.

  1. Select the "Execute (1)" step in the "Schema Processing Example" Batch Process
  2. Click the ellipsis button for the Commands property which will open the "Commands" dialog box.
  3. In the "Commands" dialog box, click the "Add" button to add a Command and set it to XML File - Load Data. You will see a command added to the command list.


Test the Batch Process Step

  1. Click the "Activity Tester" tab.
  2. Select the "shiporder (1)" document from the "XML Docs - Batch".
  3. Click the test activity button
  4. Because this document is classified as a Document Type that belongs to a model that matches its schema, its contents will be appropriately extracted into the correct "Data Elements".


  • Feel free to use the "Review" activity with the Data Viewer to view the extracted data.

Export XML Format

XML Formats for the Export of data from Grooper are quite powerful, because in spite of the fact that there are a myriad of ways in Grooper to export data directly to backend systems, plenty of businesses still prefer to use this format to control how data enters whatever system they are using.

You can define the structure of your XML Format output by leveraging .xsd schema files.

FYI

The XML File - Format command can format your output XML file in one of two ways:

  • None: This option will strip out any control characters and indentions from your XML file. This will make the file less readable to a human but smaller in file size.
  • Indented: This option will insert control characters and indentions into your XML file. This will make the file more readable to a human but larger in file size.


Like all commands, XML File - Format can be applied in a Batch Process by adding it to an Execute step.

XML Format as an Attachment Using the Merge Activity

An XML Format can be created as an attachment to a document by using the Merge activity. This attachment can later be leveraged by the Export activity.

  1. Select the "Merge" step in the "XML Merge/Export Format Process".
  2. Set the Merge Format property to XML Format.
  3. Define the naming of the attachment using the appropriate path expression on the Attachment Name property and click the drop-down for the "Content Type" property to select an appropriate type.
  4. Click the ellipsis button for the Merge Format property to open its dialog.
  5. From the "Merge Format" dialog box, select an appropriate file for the Schema File property.
  6. You can choose to use any of the other "General" properties in the "Merge Format" dialog to control how the XML Format will be built.

XML Format through Export Behavior

The functionality of the Merge activity is also built into the Export activity and can be handled by an Export Behavior.

  1. Select a "Content Type".
  2. Click the ellipsis button for the Behaviors property to open the "Behaviors" dialog box.
  3. In the "Behaviors" dialog box, click the "Add" button to add an Export Behavior, and click the ellipsis button for the Export Definitions property to open the "Export Definitions" dialog box.
  4. In the "Export Definitions" dialog box, click the "Add" button to add a File Export export definition and click the ellipsis button for the Export Formats property to open the "Export Formats" dialog box.
  5. In the "Export Formats" dialog box, click the "Add" button to add a XML Format, then select an appropriate file for the Schema File property, and choose any of the other "General" properties in the "Export Format" dialog to control how the XML Format is created.

FYI: EDI Integration - The Precursor to XML Schema Integration

XML Schema Integration was actually born from a necessity for Grooper to efficiently collect data from medical EDI files. Therefore, we will begin our understanding of Grooper's XML integration by learning how Grooper's EDI integration works. Below we will work with a mocked up EOB form.


Follow the instructions in the screenshots below.

  1. With the "EDI Integration - Batch" selected...
  2. ...on the "Viewer" tab...
  3. ...we can see the contents of this EDI file. This is a standardized format used by businesses to send data in a way that might typically be seen in the form of a document like an "Invoice" or "EOB Form". Understanding the structure of this document, especially to the untrained eye, is daunting at best. Grooper will do the hard work for us.

Selecting an EDI Schema

There is a massive library of EDI standards, and Grooper has built into a small selection of X12 schemas to choose from. These act as templates that will populate a Data Model with all the appropriate "Data Elements" necessary to house all data from a given format.

  1. Right-click on the "EDI Integration" Data Model and choose Import Schema.
  2. Check the Remove Existing property if you want to remove existing "Data Elements" from the model.
  3. In the "Import Schema" dialog click the drop-down for the Source property and choose EDI Schema Importer.
  4. Once set, expand the sub-properties of Source.
  5. Choose X12 837 Professional in the drop-down for the X12 Schema property.


  • This action has created a massive assortment of requisite "Data Elements" for the chosen standard, including validation.

Execute Activity to Collect EDI Data

Configure the Batch Process Step

Extraction using EDI Schemas does not leverage the standard Extract activity. Instead, the Execute activity will be used.

  1. Select the "Execute" Batch Process Step in the "EDI Process" Batch Process.
  2. Click the ellipsis button for the Commands property which will open the "Commands" dialog box.
  3. In the "Commands" dialog box, click the "Add" button to add a Command and set it to EDI File - Load Data. You will see a command added to the command list.
  4. Click the ellipsis button for the Mappings property which will open the "Mappings" dialog box.
  5. In the "Mappings" dialog box, click the "Add" button to add a Mapping and set the Schema Name property to X12 837 Professional". You will see a mapping added to the mapping list.
  6. Click the drop-down for the "Content Type" property and set it to the 837 Prof Document Type. Click "OK" to confirm changes and close all dialog boxes.

Test the Batch Process Step

  1. With the "Execute" Batch Process Step selected, go to the "Activity Testing" tab.
  2. Select the document from the "EDI Integration - Batch" in the "Batch Viewer".
  3. Click the "Test" button to test the activity and get extraction results.

Data View to see Results

  1. To view results, go to the "Data Review" Batch Process Step.
  2. Go to the "Activity Tester" tab.
  3. Select the "EDI Integration - Batch" in the "Batch Viewer".
  4. Click the Test button to test the activity.


  • In the Data Viewer you will see that the appropriate data was collected.


Glossary

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Behavior: A "Behavior" is one of several features applied to a Content Type (such as a description Document Type). Behaviors affect how certain Activities and Commands are executed, based how a document (folder Batch Folder) is classified. They behave differently, according to their Document Type. This includes how they are exported (how Export behaves), if and how they are added to a document search index (how the various indexing commands behave), and if and how Label Sets are used (how Classify and Extract behave in the presence of Label Sets).

  • Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
  • Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
    • For example, all Document Types will inherit their parent Content Model's Behaviors.
    • However, if a Document Type has its own Behavior configuration, it will be used instead.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Data Element: Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Field: variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

  • Data Fields are frequently referred to simply as "fields".

Data Model: data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Data Section: A insert_page_break Data Section is a container for Data Elements in a data_table Data Model. variables They can contain Data Fields, table Data Tables, and even Data Sections as child nodes and add hierarchy to a Data Model. They serve two main purposes:

  1. They can simply act as organizational buckets for Data Elements in larger Data Models.
  2. By configuring its "Extract Method", a Data Section can subdivide larger and more complex documents into smaller parts to assist in extraction.
    • "Single Instance" sections define a division (or "record") that appears only once on a document.
    • "Multi-Instance" sections define collection of repeating divisions (or "records").

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

EDI Integration: EDI Integration refers to Grooper's ability to process EDI files.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export Behavior: An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

Export Definition: Export Behaviors are defined by adding and configuring one or more Export Definitions (See Export Definition Types or the Export Definitions section of the Export article). An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Resource File: Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

SharePoint: SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

XML Schema Integration: XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.