2.90:Tab Marking (Property)

From Grooper Wiki
Revision as of 09:00, 13 August 2020 by Dgreenwood (talk | contribs)

Tab Marking allows you to insert tab characters into a document's text data.

The Tab Marking property enables tab characters for regular expression pattern matching. These characters are inserted into a document's text data wherever there is a large gap of space between characters on a line.

About

Normally, a space is a space is a space. Whether a space between characters, a space between columns, or any other space between characters, those spaces are represented by a single space character in a document's text data.

However, often, knowing there's a large amount of space one one or both sides of a label or value can be useful information for how to extract that data. The image here has three columns each with pairs of numbers.

You can visually differentiate between the numbers in the second column from the others based on the spatial context around it. The numbers in this columns have a large amount of space on either side between them and the numbers in the other columns.

However, with default extractor settings, there's no differentiation between the spaces between words and large spaces between the columns. We call words, phrases, numbers or other data separated by large amounts of space like this "segments".

As is, it would be cumbersome to write a regex pattern to differentiate between the pairs of numbers (or other "segments" on the page).

With tab characters