What's New in Grooper 2023.1: Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
(55 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==Welcome to Grooper 2023.1!== | ==Welcome to Grooper 2023.1!== | ||
<br> | <br> | ||
Line 19: | Line 10: | ||
|} | |} | ||
== Expanded | == Expanded web client == | ||
=== Light | === Light mode === | ||
{|class="how-to-table" | {|class="how-to-table" | ||
Line 38: | Line 29: | ||
|} | |} | ||
=== Improved | === Improved scripting for web client === | ||
Scripting for the web client is better than ever. Grooper users can now use the web client as their debug target thanks to the GrooperSDK Visual Studio extension. | Scripting for the web client is better than ever. Grooper users can now use the web client as their debug target thanks to the GrooperSDK Visual Studio extension. | ||
Line 53: | Line 44: | ||
* GrooperSDK | * GrooperSDK | ||
For more information, be sure to check out the | For more information, be sure to check out the [[Remote Scripting Setup]] article. | ||
== New and | == New and improved processing features == | ||
=== Secondary Types === | === Secondary Types === | ||
Line 73: | Line 64: | ||
For more information, be sure to check out the | For more information, be sure to check out the [[Secondary Types]] article. | ||
| | | | ||
[[File:2023-1 WhatsNewInGrooper 03.png]] | [[File:2023-1 WhatsNewInGrooper 03.png]] | ||
|} | |} | ||
=== EDI | === EDI support === | ||
Grooper can create Data Models directly from an X12 EDI schema and load data from an EDI file directly into a Grooper document. | Grooper can create Data Models directly from an X12 EDI schema and load data from an EDI file directly into a Grooper document. | ||
Line 93: | Line 84: | ||
''FYI: X12 refers to the organization developing and maintaining the EDI standards. X12 is the common EDI standard in the US and North America.'' | ''FYI: X12 refers to the organization developing and maintaining the EDI standards. X12 is the common EDI standard in the US and North America.'' | ||
For more information, be sure to check out the [[XML Schema Integration]] article. | |||
=== XML Data Interchange === | === XML Data Interchange === | ||
Line 102: | Line 95: | ||
* Generate an XML document according to an XML schema using a new '''''Merge/Export Format''''' (called '''''XML Format'''''). | * Generate an XML document according to an XML schema using a new '''''Merge/Export Format''''' (called '''''XML Format'''''). | ||
=== New | For more information, be sure to check out the [[XML Schema Integration]] article. | ||
=== New node type: Resource Files === | |||
{| | {| | ||
Line 127: | Line 122: | ||
{| | {| | ||
|valign=top| | |valign=top style="width:50%"| | ||
Simplify expressions with '''''Variables'''''. | Simplify expressions with '''''Variables'''''. | ||
Line 145: | Line 140: | ||
|} | |} | ||
=== Lookup | === Lookup improvements === | ||
==== New Lookup | ==== New Lookup type: XML Lookup ==== | ||
The new '''''XML Lookup''''' allows users to perform a lookup against data in an XML file. It uses XPath selectors with Grooper variables to select data in an XML hierarchy. This gives Grooper users a mechanism to query data that lies somewhere between a '''''Lexicon Lookup''''' and a '''''Database Lookup'''''. It is more capable than a '''''Lexicon Lookup'''''. XML data is more complex than the simple "key-value" list you can query in a '''Lexicon'''. However, XML data is not as complex as a relational database. The '''''XML Lookup''''' gives you an option when you need to query fairly static data that lies between a '''Lexicon''' and a database in terms of its complexity. | The new '''''XML Lookup''''' allows users to perform a lookup against data in an XML file. It uses XPath selectors with Grooper variables to select data in an XML hierarchy. This gives Grooper users a mechanism to query data that lies somewhere between a '''''Lexicon Lookup''''' and a '''''Database Lookup'''''. It is more capable than a '''''Lexicon Lookup'''''. XML data is more complex than the simple "key-value" list you can query in a '''Lexicon'''. However, XML data is not as complex as a relational database. The '''''XML Lookup''''' gives you an option when you need to query fairly static data that lies between a '''Lexicon''' and a database in terms of its complexity. | ||
Line 153: | Line 148: | ||
'''''Please Note!''''' For most lookups, you reference Grooper '''Data Elements''' with the "@" symbol. However, "@" is already part of the XPath syntax. '''Data Elements''' are instead referenced with the "%" symbol for '''''XML Lookups'''''. | '''''Please Note!''''' For most lookups, you reference Grooper '''Data Elements''' with the "@" symbol. However, "@" is already part of the XPath syntax. '''Data Elements''' are instead referenced with the "%" symbol for '''''XML Lookups'''''. | ||
==== New Trigger | ==== New Trigger mode: Custom ==== | ||
A new ''Custom'' option for a lookup's '''''Trigger Mode''''' now allows for conditional lookup execution. Selecting ''Custom'' exposes a '''''Trigger''''' for the lookup. This uses an expression to define conditions when to execute the lookup. | A new ''Custom'' option for a lookup's '''''Trigger Mode''''' now allows for conditional lookup execution. Selecting ''Custom'' exposes a '''''Trigger''''' for the lookup. This uses an expression to define conditions when to execute the lookup. | ||
Line 161: | Line 156: | ||
This can also create opportunities for "waterfall" query approaches where if the first (generally more specific) query failed to produce a result, you can fall back on a second (generally looser) query. | This can also create opportunities for "waterfall" query approaches where if the first (generally more specific) query failed to produce a result, you can fall back on a second (generally looser) query. | ||
=== | === Prescan Threshold for Labelset-Based Classification === | ||
Before 2023.1 the '''''Labelset-Based''''' classification method had a problem. It became less efficient the more '''Document Types''' were added to the '''Content Model''' (and therefore the more Label Sets were added). The more '''Document Types''', the more Label Sets that must be loaded and checked against each document. | |||
The new '''''Prescan Threshold''''' property available to ''''''Labelset-Based''''' classification resolves this by only running Label Sets that are likely to succeed, not ''every'' one on each document. '''''Prescan Threshold''''' determines this based on word statistics. A minimum percent of Label Set words must be present on the document to be considered for classification. This has been demonstrated to speed up classification up to 15x what it was without a '''''Prescan Threshold'''''. This extends the '''''Labelset-Based''''' classification method to '''Content Models''' with ''thousands'' of '''Document Types'''. | |||
=== New "Summary" Tabs === | |||
There are new and improved "Summary" tabs on '''Data Models''', '''Data Elements''', and '''Document Types''' in the web client. These summaries give Grooper Design users useful "at a glance" information about these objects in your '''Content Models'''. Furthermore, these summaries have helpful links to associated objects, allowing for quicker navigation through complicated '''Content Models''' with large numbers of '''Data Elements''' and '''Document Types'''. | |||
'''Data Model''' "Summary" tabs now list all expressions configured for any '''Data Element''' configured with an expression ('''''Default Value''''', '''''Calculated Value''''', '''Is Required''', etc.) | |||
* Clicking on a listed '''Data Element''' will take you directly to that element in the '''Data Model'''. | |||
In version 2021, the '''''Tabular Layout''''' table extraction method was created as an improved version of the '''''Header-Value''''' table extraction method. '''''Tabular Layout's''''' logic is similar to '''''Header-Value''''' in that it uses column header labels and value extractors to determine a table's structure and extract cell data. However, '''''Tabular Layout''''' is superior to '''''Header-Value''''' in the following ways: | '''Data Element''' "Summary" tabs also list any configured expressions for the selected element. This summary also lists any '''Document Types''' configured with overrides for the selected element. | ||
* Clicking on a listed '''Document Type''' will take you directly to the '''Document Type'''. | |||
'''Document Type''' "Summary" tabs also have an "Overrides" section, listing all '''Data Elements''' being overridden for the selected '''Document Type'''. | |||
* Clicking on a listed '''Data Element''' will take you directly to that element. | |||
== Changes to table extraction == | |||
=== Header-Value is out. Tabular Layout is in === | |||
In version 2021, the '''''Tabular Layout''''' table extraction method was created as an improved version of the '''''Header-Value''''' table extraction method. '''''Tabular Layout's''''' logic is similar to '''''Header-Value''''' in that it uses column header labels and value extractors to determine a table's structure, detect columns and rows, and extract cell data. However, '''''Tabular Layout''''' is superior to '''''Header-Value''''' in the following ways: | |||
* '''''Tabular Layout's''''' initial setup is simpler. | * '''''Tabular Layout's''''' initial setup is simpler. | ||
* '''''Tabular Layout''''' has better built in column and row detection. | * '''''Tabular Layout''''' has better built in column and row detection. | ||
Line 174: | Line 189: | ||
For these reasons, '''''Tabular Layout''''' will fully replace '''''Header-Value''''' in version 2023.1. Any '''Data Table''' using '''''Header-Value''''' will be converted to '''''Tabular Layout''''' upon upgrade. | For these reasons, '''''Tabular Layout''''' will fully replace '''''Header-Value''''' in version 2023.1. Any '''Data Table''' using '''''Header-Value''''' will be converted to '''''Tabular Layout''''' upon upgrade. | ||
'''''Tabular Layout''''' will greatly improve your table extraction results, particularly if you familiarize yourself with all [[Tabular Layout]] can do for you. | |||
==== Data Element Extensions | {|class="attn-box" | ||
| | |||
⚠ | |||
| | |||
'''PLEASE NOTE:''' If you are currently using '''''Header-Value''''' in production, you should take '''''extra''''' care when testing your upgrade to ensure your results are consistent. You may need to adjust aspects of your table extraction setup to get the best results possible after upgrading. | |||
'''''ALWAYS BACKUP YOUR GROOPER DATABASE BEFORE UPGRADING TO ENSURE YOU CAN REVERT BACK IF NECESSARY''''' | |||
|} | |||
==== Known conversion issues ==== | |||
{|class="wikitable" | |||
| | |||
'''''Tabular Layout''''' has dramatically improved logic to detect the boundaries between columns (or "gutters"). '''''Header-Value''''' had, comparatively, little gutter detection. This made it possible, in some cases, to detect a column using a partial label. | |||
* For example, using the partial label "Extended" instead of the full label "Extended Price". | |||
* It is always best practice to capture a columns ''full'' label when possible. | |||
** If you find decreased performance after upgrading to 2023.1, ensure each '''Data Column's''' '''''Header Extractor''''' returns the full label for each column on the document. | |||
|- | |||
| | |||
Multiline table rows were returned by default for '''''Header-Value'''''. For '''''Tabular Layout''''' users can choose to return single-line or multiline rows by enabling the '''''Multiline Rows''''' property. | |||
* In the upgrade code, '''''Multiline Rows''''' will automatically be ''enabled''. This will mimic this feature of '''''Header-Value'''''. | |||
However, there are rare multiline row structures that may require you to manually configure a '''''Multiline Rows''''' sub-property called '''''Maximum Leading Lines'''''. | |||
[[File:2023-1 WhatsNewInGrooper 13.png|right|800px]] | |||
* In the example to the right, the single row is outlined in red. | |||
** Note it is a multiline row. The row takes up two lines of text. | |||
* '''''Tabular Layout''''' detected the row using data on the ''second'' line, highlighted in yellow. | |||
* However, there are two pieces of information we want to collect on the ''first'' line, highlighted in purple. | |||
** This is atypical. Most commonly, additional lines in a multiline row come after the line detected for a row, not before. | |||
In order to capture these values, we need to inform '''''Tabular Layout''''' that the row is composed of one (or more) lines before the line detected for the row (i.e. "leading lines"). To do this: | |||
# Expand the '''''Multiline Rows''''' sub-properties. | |||
# Configure the '''''Maximum Leading Lines''''' property. | |||
#* For our example we would enter ''1'', because there is one line above the line detected for the row. | |||
|} | |||
=== New Tabular Layout property: Find Column Positions === | |||
There is a new '''''Tabular Layout''''' property called '''''Find Column Positions'''''. | |||
* This property is found in the '''''Row Detection''''' sub-properties. | |||
* This property was added to '''''Tabular Layout''''' to make the upgrade logic from '''''Header-Value''''' to '''''Tabular Layout''''' possible. | |||
The property enables row detection in cases where required column values do not vertically align with the column header. | |||
* If set to ''True'', Grooper will expand the left or right boundaries of the required column's header if values are detected nearby. | |||
** Additionally, there is logic built in to ensure Grooper will not adjust column bounds if it encroaches on other column headers. | |||
** '''''PLEASE NOTE''''' Enabling this property causes Grooper to expand column headers '''''only if''''' there are '''''no''''' aligned value extractor hits. Column positions will not be expanded if the column's value extractor produces at least one hit. | |||
* "Required" columns are those '''Data Columns''' with the '''''Tabular Layout Options > Row Detection''''' property set to ''Required''. | |||
* This property has ''no'' impact on column headers whose boundaries have already been expanded due to line detection. | |||
* '''''PLEASE NOTE''''' Enabling this property can produce unpredictable results as you have no control how Grooper looks left or right for nearby values. If using Label Sets, it is best practice to keep this property turned ''False'' and adjust the width of a header label manually using the label's '''''Padding''''' properties. | |||
=== New Tabular Layout property: Merge Multiple Instances === | |||
There is a new '''''Tabular Layout''''' property called '''''Merge Multiple Instances''''' | |||
* This property is found in the '''''Row Detection''''' sub-properties. | |||
* This property was added to '''''Tabular Layout''''' to make the upgrade logic from '''''Header-Value''''' to '''''Tabular Layout''''' possible. | |||
This property changes the standard way '''''Footer''''' extractors/labels work for '''''Tabular Layout'''''. | |||
When set to ''False'', a footer result will behave as it always has for '''''Tabular Layout'''''. | |||
* Grooper will detect rows starting at the table's row of header labels up to the ''first instance'' of a footer. | |||
* No rows will be detected ''after'' the footer's location. | |||
When set to ''True'', Grooper will combine spans of rows between ''multiple instances'' of footer results/labels. | |||
* Grooper will detect rows starting at a the table's row of header labels as normal and stop detecting rows at the first instance of a footer. However, the difference is, if header labels are found on the next page, Grooper will continue detecting rows up to ''another'' instance of a footer (or the end of the document if no second footer is present). Row results from the two (or more) pages are then merged into a single table. | |||
* Results from multiple spans of rows between multiple headers and footers will then be merged together in a single table. | |||
* This property is useful for documents like invoices that commonly have footers at the bottom of a line items table on each page of a multipage invoice. | |||
=== Data Element Extensions === | |||
"Data Element Extensions" are a new way of handling properties for '''Data Elements''' given a parent '''Data Element's''' configuration. Given a parent '''Data Element''' configuration, properties can now ''extend'' to child '''Data Elements'''. | "Data Element Extensions" are a new way of handling properties for '''Data Elements''' given a parent '''Data Element's''' configuration. Given a parent '''Data Element''' configuration, properties can now ''extend'' to child '''Data Elements'''. | ||
Line 189: | Line 275: | ||
** Here, you can set the same '''''Row Detection''''', '''''Secondary Extract''''' and '''''Secondary Extract Mode'''''' settings found previously in the '''''Column Settings''''' collection editor. | ** Here, you can set the same '''''Row Detection''''', '''''Secondary Extract''''' and '''''Secondary Extract Mode'''''' settings found previously in the '''''Column Settings''''' collection editor. | ||
| | | | ||
[[File:2023-1 WhatsNewInGrooper 09.png| | [[File:2023-1 WhatsNewInGrooper 09.png|440px]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
Line 198: | Line 284: | ||
** Here, you can configure the '''''Read Method''''' property to define f a '''Data Column''' is an "OCR Column" or an "OMR Column". | ** Here, you can configure the '''''Read Method''''' property to define f a '''Data Column''' is an "OCR Column" or an "OMR Column". | ||
| | | | ||
[[File:2023-1 WhatsNewInGrooper 10.png]] | [[File:2023-1 WhatsNewInGrooper 10.png|440px]] | ||
|} | |} | ||
== Batch management & Review enhancements == | |||
== Batch | |||
=== "FIFO" Batch Priority === | === "FIFO" Batch Priority === | ||
Line 235: | Line 299: | ||
=== Dynamic Value Lists === | === Dynamic Value Lists === | ||
=== Review | |||
'''Data Fields''' have a new '''''List Values''''' option: "''Include Lookup Results''" | |||
* When enabled, the field will appear as a drop-down list in '''Review'''. | |||
* The list will include all distinct values returned from '''''Lookups''''' targeting the field. | |||
** '''''Be Aware!''''' '''''Lookups''''' used to populate value lists in this way should have their '''''Conflict Disposition''''' set to ''Ignore'' as the lookup is expected to return multiple results. | |||
{| | |||
|valign=top style="width:50%"| | |||
In the example here, a '''''Database Lookup''''' is configured to return various cities for a particular value extracted for a "State" field. The "City" field returns a different list, given the lookup's results. | |||
* When the "State" field is "OK", the user is able to select cities in Oklahoma, as returned by the lookup. | |||
* When the "State" field is "TX", the user is able to select cities in Texas, as returned by the lookup. | |||
| | |||
[[File:2023-1 WhatsNewInGrooper 11.png]] | |||
| | |||
[[File:2023-1 WhatsNewInGrooper 12.png]] | |||
|} | |||
=== Review improvements === | |||
''This section is a work in progress and will be expanded at a later date'' | |||
* Swap panes | * Swap panes | ||
Line 244: | Line 327: | ||
* Display labels (Use “\auto” to use labels in Label Sets!) | * Display labels (Use “\auto” to use labels in Label Sets!) | ||
== Misc | == Misc other changes == | ||
The '''''Labeled Value''''' extractor can now function ''without'' a '''''Value Extractor''''' configured | {|class="wikitable" | ||
* Previously, this functionality was only accessible | | | ||
The '''''Labeled Value''''' extractor can now always function ''without'' a '''''Value Extractor''''' configured. | |||
* Previously, this functionality was only accessible using Label Sets ('''''Labeling Behavior''''') when a '''Data Field''' used '''''Labeled Value'''''. | |||
* Be aware, how the extractor collects results without a configured '''''Value Extractor''''' is very different from the extractor's typical logic. It is intended as a "dumbed down" way of collecting the most simplest of data in the most simplest of layouts. | * Be aware, how the extractor collects results without a configured '''''Value Extractor''''' is very different from the extractor's typical logic. It is intended as a "dumbed down" way of collecting the most simplest of data in the most simplest of layouts. | ||
|- | |||
| | |||
The '''''Labeled Value''''' extractor now has a configurable '''''Footer Extractor''''' property. | |||
* Previously, a footer value could only be added using Label Sets ('''''Labeling Behavior''''') when captured as a '''Data Field's''' '''''Footer''''' label. | |||
|- | |||
| | |||
In the Label Set editor, you can use “Alt + Double Click” to capture custom labels | |||
|- | |||
| | |||
In the Label Set editor, '''Data Elements''' with no labels defined display a checkbox. Clicking this checkbox will hide these '''Data Elements''' in '''Review'''. | |||
* More specifically, this is a shortcut to turn that '''Data Element's''' '''''Visible''''' property to ''False'' in the '''Document Type's''' Overrides. | |||
|- | |||
| | |||
This version fixes a long standing issue with Label Sets and Overrides. | |||
When enabling a '''''Labeling Behavior''''' and collecting labels for a '''Data Field''' or '''Data Column''', those labels will '''''always''''' be used in place of a '''''Label Extractor''''' or '''''Header Extractor'''''. | |||
To avoid confusion, you should NEVER collect a label ''and'' override the label. Simply choose one or the other depending on your needs. | |||
* Imagine you have a '''Document Type''' and a '''Data Field''' using the '''''Labeled Value''''' extractor. | |||
** If you collect a label for that '''Data Field''', it will ''always'' be used in place of the extractor's '''''Label Extractor'''''. | |||
** If you override the '''''Label Extractor''''' with an extractor configuration, it will be ignored and use the collected label instead. | |||
*** What if you want to use the '''''Label Extractor''''' override instead of the collected label? Simply do not collect a label for that '''Data Field''' for that '''Document Type'''. | |||
|- | |||
| | |||
The '''Extract''' activity now has a '''''Rules''''' property. | |||
* This allows users to specify one or more '''Data Ruels''' to run after extraction. | |||
* Assigning '''Data Rules''' in this manner eliminates the need to run the '''Apply Rules''' activity as a separate activity. | |||
** Pro: Your '''Batch Process''' will run faster because the data does not have to be loaded a second time. | |||
** Con: Running '''Data Rules''' as a separate activity can be beneficial for troubleshooting purposes. | |||
|- | |||
| | |||
New "Change Priority" command applicable to '''Batches''' from the Batches page. | |||
|- | |||
| | |||
Ability to import documents as the root of a '''Batch'''. | |||
* New '''''Import as Root''''' option on Import Providers | |||
* Available when the maximum documents per '''Batch''' is 1. | |||
** Eliminates unnecessary folder level when creating one '''Batch''' per single document imported. | |||
|- | |||
| | |||
CSS Improvements | |||
* Preview now works when editing container-level CSS (e.g. a '''Data Section''' or a '''Data Table''') | |||
* New “self” CSS construct | |||
** Useful when editing container-level CSS and you want to edit the container’s own CSS, rather than its children’s CSS. | |||
|- | |||
| | |||
The "Publish To Repository" command now works with any descendant of the '''Projects''' folder. | |||
* This means you can publish individual nodes in a '''Project''' to another Grooper Repository (not just the whole '''Project'''). | |||
* The ancestor nodes ''must'' exist in the destination repository for the Publish operation to work. | |||
** Example 1: If you attempt to publish an '''OCR Profile''' to a repository that does not contain the parent '''Project''', the Publish will fail. | |||
** Example 2: If you attempt to publish a '''Document Type''' to a repository and its parent '''Content Model''' does not exist, the Publish will fail (even if the parent '''Project''' does). | |||
''Note: This functionality was added late in the development cycle. Currently, nodes may only be published one at a time. Future development will include the capability to publish multiple nodes by multi-selecting them.'' | |||
|- | |||
| | |||
New '''''Data Export''''' functionality: '''''Custom Statements''''' | |||
'''''Custom Statements''''' allow users to configure a custom SQL statement using an expression. This will allow for more nuanced SQL database exports. For example, a user could modify existing records in a database with an UPDATE command using '''''Custom Statements'''''. | |||
Users define two properties to configure each '''''Custom Statement''''': | |||
# '''''Source Scope''''': Specifies the '''Data Elements''' to be used for the statement. | |||
#* Can be set to a '''Data Model''', '''Data Section''' or '''Data Table''' | |||
# '''''Statement'''': A code expression used to generate the SQL statement. | |||
#* Example: <code>$"UPDATE invoices SET status=2 WHERE id={Invoice Id}"</code> | |||
|} | |||
=== Expressions and Scripting Related Changes === | |||
{|class="wikitable" | |||
|- | |||
| | |||
The <code>CleanFilename()</code> function was removed from the global namespace. | |||
* When writing expressions, please use the following instead: | |||
* <code>StringManager.CleanFilename()</code> | |||
|- | |||
| | |||
'''''Next Step''''' expressions no longer require <code>DirectCast()</code>. | |||
|} |
Latest revision as of 12:32, 12 February 2024
Welcome to Grooper 2023.1!
Grooper version 2023.1 is here! Below you will find brief descriptions on new and/or changed features. When available, follow any links to extended articles on a topic. |
Expanded web client
Light mode
Let there be light (mode)! Users can now toggle between dark mode and light mode in the Grooper web client. |
Improved scripting for web client
Scripting for the web client is better than ever. Grooper users can now use the web client as their debug target thanks to the GrooperSDK Visual Studio extension.
- The GrooperSDK extension is available for download in the Visual Studio Marketplace.
- This extension allows users to set a web browser as their debug target.
- Users can now download scripts and work on them independently of Grooper, saving changes directly from Visual Studio using the GrooperSDK extension.
Furthermore, users can edit scripts from any machine, not just the Grooper web server, with the following components must be installed:
- Grooper (with a Repository connection made to the Grooper Repository hosted by the web server)
- Grooper Web Client
- IIS
- Visual Studio 2019
- GrooperSDK
For more information, be sure to check out the Remote Scripting Setup article.
New and improved processing features
Secondary Types
Secondary Types is a new property assignable to Batch Folders. This feature allows a single document to be assigned multiple Document Types. Secondary Types can be used with the following activities:
|
EDI support
Grooper can create Data Models directly from an X12 EDI schema and load data from an EDI file directly into a Grooper document.
- Data Elements adhering to an EDI schema can be created by right-clicking a Data Model, selecting Import Schema... and configuring the EDI Schema Importer.
- Data from an EDI file can be loaded into Grooper using the Execute activity, configured with an EDI File - Load Data command.
Currently, Grooper can process the following EDI schemas natively:
- X12 835
- X12 837 Professional
- X12 837 Dental
- X12 837 Institutional
FYI: X12 refers to the organization developing and maintaining the EDI standards. X12 is the common EDI standard in the US and North America.
For more information, be sure to check out the XML Schema Integration article.
XML Data Interchange
Grooper users can now ingest and build XML files more effectively with our suite of "XML Data Interchange" capabilities. With this new funcitonality users can:
- Create Data Elements from an XML schema file right-clicking a Data Model, selecting Import Schema... and configuring the XML Schema Importer.
- Load data from an XML file that adheres to an XML schema using the Execute activity, configured with an XML File - Load Data command.
- Validate an XML document against an XML schema using the Execute activity, configured with an XML File - Validate Schema command.
- Generate an XML document according to an XML schema using a new Merge/Export Format (called XML Format).
For more information, be sure to check out the XML Schema Integration article.
New node type: Resource Files
Resource Files allow users to store any kind of file in a Project. Simply drag a file from your file system and drop it into a Project (or folder in a Project).
|
Expression Variables
Simplify expressions with Variables. Data Models, Data Sections and Data Tables have a new Variables property. Users can add a list of expression variables, each having a name, a value type and an expression.
|
Lookup improvements
New Lookup type: XML Lookup
The new XML Lookup allows users to perform a lookup against data in an XML file. It uses XPath selectors with Grooper variables to select data in an XML hierarchy. This gives Grooper users a mechanism to query data that lies somewhere between a Lexicon Lookup and a Database Lookup. It is more capable than a Lexicon Lookup. XML data is more complex than the simple "key-value" list you can query in a Lexicon. However, XML data is not as complex as a relational database. The XML Lookup gives you an option when you need to query fairly static data that lies between a Lexicon and a database in terms of its complexity.
Please Note! For most lookups, you reference Grooper Data Elements with the "@" symbol. However, "@" is already part of the XPath syntax. Data Elements are instead referenced with the "%" symbol for XML Lookups.
New Trigger mode: Custom
A new Custom option for a lookup's Trigger Mode now allows for conditional lookup execution. Selecting Custom exposes a Trigger for the lookup. This uses an expression to define conditions when to execute the lookup.
This can make lookups more efficient, preventing a query from executing unless certain conditions are met. This can be particularly helpful when performing Database Lookups' that query larger databases during Review. If you don't have to execute a lookup unless certain conditions are met, save you and your reviewers time by only conditionally executing the query with our new Custom Trigger Mode.
This can also create opportunities for "waterfall" query approaches where if the first (generally more specific) query failed to produce a result, you can fall back on a second (generally looser) query.
Prescan Threshold for Labelset-Based Classification
Before 2023.1 the Labelset-Based classification method had a problem. It became less efficient the more Document Types were added to the Content Model (and therefore the more Label Sets were added). The more Document Types, the more Label Sets that must be loaded and checked against each document.
The new Prescan Threshold property available to 'Labelset-Based classification resolves this by only running Label Sets that are likely to succeed, not every one on each document. Prescan Threshold determines this based on word statistics. A minimum percent of Label Set words must be present on the document to be considered for classification. This has been demonstrated to speed up classification up to 15x what it was without a Prescan Threshold. This extends the Labelset-Based classification method to Content Models with thousands of Document Types.
New "Summary" Tabs
There are new and improved "Summary" tabs on Data Models, Data Elements, and Document Types in the web client. These summaries give Grooper Design users useful "at a glance" information about these objects in your Content Models. Furthermore, these summaries have helpful links to associated objects, allowing for quicker navigation through complicated Content Models with large numbers of Data Elements and Document Types.
Data Model "Summary" tabs now list all expressions configured for any Data Element configured with an expression (Default Value, Calculated Value, Is Required, etc.)
- Clicking on a listed Data Element will take you directly to that element in the Data Model.
Data Element "Summary" tabs also list any configured expressions for the selected element. This summary also lists any Document Types configured with overrides for the selected element.
- Clicking on a listed Document Type will take you directly to the Document Type.
Document Type "Summary" tabs also have an "Overrides" section, listing all Data Elements being overridden for the selected Document Type.
- Clicking on a listed Data Element will take you directly to that element.
Changes to table extraction
Header-Value is out. Tabular Layout is in
In version 2021, the Tabular Layout table extraction method was created as an improved version of the Header-Value table extraction method. Tabular Layout's logic is similar to Header-Value in that it uses column header labels and value extractors to determine a table's structure, detect columns and rows, and extract cell data. However, Tabular Layout is superior to Header-Value in the following ways:
- Tabular Layout's initial setup is simpler.
- Tabular Layout has better built in column and row detection.
- Tabular Layout is more fully featured, capable of doing things Header-Value cannot.
- Tabular Layout is Label Set supported.
For these reasons, Tabular Layout will fully replace Header-Value in version 2023.1. Any Data Table using Header-Value will be converted to Tabular Layout upon upgrade.
Tabular Layout will greatly improve your table extraction results, particularly if you familiarize yourself with all Tabular Layout can do for you.
⚠ |
PLEASE NOTE: If you are currently using Header-Value in production, you should take extra care when testing your upgrade to ensure your results are consistent. You may need to adjust aspects of your table extraction setup to get the best results possible after upgrading. ALWAYS BACKUP YOUR GROOPER DATABASE BEFORE UPGRADING TO ENSURE YOU CAN REVERT BACK IF NECESSARY |
Known conversion issues
Tabular Layout has dramatically improved logic to detect the boundaries between columns (or "gutters"). Header-Value had, comparatively, little gutter detection. This made it possible, in some cases, to detect a column using a partial label.
|
Multiline table rows were returned by default for Header-Value. For Tabular Layout users can choose to return single-line or multiline rows by enabling the Multiline Rows property.
In order to capture these values, we need to inform Tabular Layout that the row is composed of one (or more) lines before the line detected for the row (i.e. "leading lines"). To do this:
|
New Tabular Layout property: Find Column Positions
There is a new Tabular Layout property called Find Column Positions.
- This property is found in the Row Detection sub-properties.
- This property was added to Tabular Layout to make the upgrade logic from Header-Value to Tabular Layout possible.
The property enables row detection in cases where required column values do not vertically align with the column header.
- If set to True, Grooper will expand the left or right boundaries of the required column's header if values are detected nearby.
- Additionally, there is logic built in to ensure Grooper will not adjust column bounds if it encroaches on other column headers.
- PLEASE NOTE Enabling this property causes Grooper to expand column headers only if there are no aligned value extractor hits. Column positions will not be expanded if the column's value extractor produces at least one hit.
- "Required" columns are those Data Columns with the Tabular Layout Options > Row Detection property set to Required.
- This property has no impact on column headers whose boundaries have already been expanded due to line detection.
- PLEASE NOTE Enabling this property can produce unpredictable results as you have no control how Grooper looks left or right for nearby values. If using Label Sets, it is best practice to keep this property turned False and adjust the width of a header label manually using the label's Padding properties.
New Tabular Layout property: Merge Multiple Instances
There is a new Tabular Layout property called Merge Multiple Instances
- This property is found in the Row Detection sub-properties.
- This property was added to Tabular Layout to make the upgrade logic from Header-Value to Tabular Layout possible.
This property changes the standard way Footer extractors/labels work for Tabular Layout.
When set to False, a footer result will behave as it always has for Tabular Layout.
- Grooper will detect rows starting at the table's row of header labels up to the first instance of a footer.
- No rows will be detected after the footer's location.
When set to True, Grooper will combine spans of rows between multiple instances of footer results/labels.
- Grooper will detect rows starting at a the table's row of header labels as normal and stop detecting rows at the first instance of a footer. However, the difference is, if header labels are found on the next page, Grooper will continue detecting rows up to another instance of a footer (or the end of the document if no second footer is present). Row results from the two (or more) pages are then merged into a single table.
- Results from multiple spans of rows between multiple headers and footers will then be merged together in a single table.
- This property is useful for documents like invoices that commonly have footers at the bottom of a line items table on each page of a multipage invoice.
Data Element Extensions
"Data Element Extensions" are a new way of handling properties for Data Elements given a parent Data Element's configuration. Given a parent Data Element configuration, properties can now extend to child Data Elements.
Currently, we are implementing this functionality to move certain table extraction properties from the Data Table to relevant child Data Columns.
For Tabular Layout:
|
|
For Grid Layout:
|
Batch management & Review enhancements
"FIFO" Batch Priority
Traditionally in Grooper, Batches are created with a set priority (between "1" and "5"). When all Batches have the same priority, processing can bottleneck around processor intensive activities. This can cause situations where the first Batch that was imported isn't necessarily the first Batch to finish processing or get to a Review step first.
Batches can now be processed with a "First In, First Out" priority in Grooper. Import Providers have a new Batch Creation option called Increment Priority. Turning this property on increments each Batch's priority by "1" each time a new Batch is created. This causes each Batch to fully run through a Batch Process before the next is started.
Please note! This functionality is intended for large user-directed imports, not Batch imports directed by an Import Watcher service.
Dynamic Value Lists
Data Fields have a new List Values option: "Include Lookup Results"
- When enabled, the field will appear as a drop-down list in Review.
- The list will include all distinct values returned from Lookups targeting the field.
- Be Aware! Lookups used to populate value lists in this way should have their Conflict Disposition set to Ignore as the lookup is expected to return multiple results.
In the example here, a Database Lookup is configured to return various cities for a particular value extracted for a "State" field. The "City" field returns a different list, given the lookup's results.
|
Review improvements
This section is a work in progress and will be expanded at a later date
- Swap panes
- Horizontal/Vertical view toggle
- Double-click capture (alternative to rubberband OCR)
- Data Grid zoom in/zoom out
- Sticky field support in web client
- Display labels (Use “\auto” to use labels in Label Sets!)
Misc other changes
The Labeled Value extractor can now always function without a Value Extractor configured.
|
The Labeled Value extractor now has a configurable Footer Extractor property.
|
In the Label Set editor, you can use “Alt + Double Click” to capture custom labels |
In the Label Set editor, Data Elements with no labels defined display a checkbox. Clicking this checkbox will hide these Data Elements in Review.
|
This version fixes a long standing issue with Label Sets and Overrides. When enabling a Labeling Behavior and collecting labels for a Data Field or Data Column, those labels will always be used in place of a Label Extractor or Header Extractor. To avoid confusion, you should NEVER collect a label and override the label. Simply choose one or the other depending on your needs.
|
The Extract activity now has a Rules property.
|
New "Change Priority" command applicable to Batches from the Batches page. |
Ability to import documents as the root of a Batch.
|
CSS Improvements
|
The "Publish To Repository" command now works with any descendant of the Projects folder.
Note: This functionality was added late in the development cycle. Currently, nodes may only be published one at a time. Future development will include the capability to publish multiple nodes by multi-selecting them. |
New Data Export functionality: Custom Statements Custom Statements allow users to configure a custom SQL statement using an expression. This will allow for more nuanced SQL database exports. For example, a user could modify existing records in a database with an UPDATE command using Custom Statements. Users define two properties to configure each Custom Statement:
|
Expressions and Scripting Related Changes
The
|
Next Step expressions no longer require |