Vertical Wrap (Property): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 9: Line 9:
Stacked labels are simply multi-word labels whose words are aligned vertically on multiple lines.  In other words, they are "stacked" on top of each other.  You can contrast this with simple labels which appear on a single line of the document.
Stacked labels are simply multi-word labels whose words are aligned vertically on multiple lines.  In other words, they are "stacked" on top of each other.  You can contrast this with simple labels which appear on a single line of the document.


In the before times (before version '''2021'''), stacked labels presented somewhat of a challenge.  For simple labels, the approach is, well, simple.  We use regular expression to match the label.  Do you want to match the label "Zip Code"?  Your regex pattern is simply <code>Zip Code</code>.
In the before times (before version '''2021'''), stacked labels presented somewhat of a challenge.  For simple labels, the approach is, well, simple.  We use regular expression to match the label.  Do you want to match the label "ZIP CODE"?  Your regex pattern is simply <code>ZIP CODE</code>.
<br clear = all>
<br clear = all>



Revision as of 09:55, 31 March 2021

Vertical Wrap is a property of the Content Model Labeling Behavior and certain Extractor Types used to provide simplified extraction of stacked labels.

About

Stacked labels are simply multi-word labels whose words are aligned vertically on multiple lines. In other words, they are "stacked" on top of each other. You can contrast this with simple labels which appear on a single line of the document.

In the before times (before version 2021), stacked labels presented somewhat of a challenge. For simple labels, the approach is, well, simple. We use regular expression to match the label. Do you want to match the label "ZIP CODE"? Your regex pattern is simply ZIP CODE.

However, for stacked labels, it's a little trickier. Regular expression matches a regex pattern against the entire document as one big text string. By itself, it doesn't have the capability to match labels stacked on top of each other because it just matches against the text flow character by character.

Instead, we had to use a Data Type, collated as an Ordered Array, using the Vertical Layout mode, looking for each line of the stacked label as the array elements, and usually specifying some minimum distance between the words in the label to throw out false positive results.

You can see here an example of how this was done.

  1. This is the parent Data Type (also the object we have selected in the Node Tree).
  2. The two child extractors return the results of each line.
  3. The Data Type is configured to use the Ordered Array option for its Collation, enabling Vertical Layout mode.
  4. The Data Type returns the label, looking for the word "ZIP" stacked on top of "CODE".


Seems like a lot of work to find the label "ZIP CODE", right?

Starting in version 2021, there is a much easier way of doing this through the Vertical Wrap property.