Text Preprocessor: Difference between revisions

From Grooper Wiki
Created page with "Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following: * Paragraph Marking * Tab Marking * Vertical Tab Marking * Ignore Control Characters"
 
No edit summary
Line 1: Line 1:
Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:
{{AutoVersion}}
* [[Paragraph Marking]]
<blockquote>{{#lst:Glossary|Text Preprocessor}}</blockquote>
* [[Tab Marking]]
 
* [[Vertical Tab Marking]]
 
* [[Ignore Control Characters]]
The Text Preprocessor is enabled by various configuration items in Grooper (generally by a "Preprocessing" or "Preprocessing Options" property). This includes:
* The [[Ask AI]] Value Extractor and its "Preprocessing" property.
* The [[Pattern Match]] Value Extractor and its "Preprocessing" property.
* The [[List Match]] Value Extractor and its "Preprocessing" property.
* The [[Label Match]] Value Extractor and its "Preprocessing" property.
* The [[Word Match]] Value Extractor and its "Preprocessing" property.
* The [[Field Match]] Value Extractor and its "Preprocessing" property.
* The [[Pattern-Based]] Collation Provider and its "Preprocessing Options" property.
* The [[Flow Collation|Flow]] Layout Provider and its "Preprocessing Options" property.
* The following [[Quoting Method]]s using their "Preprocessing" property:
** Extracted
** Labeled Region
** Semantic

Revision as of 16:12, 26 August 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:


The Text Preprocessor is enabled by various configuration items in Grooper (generally by a "Preprocessing" or "Preprocessing Options" property). This includes:

  • The Ask AI Value Extractor and its "Preprocessing" property.
  • The Pattern Match Value Extractor and its "Preprocessing" property.
  • The List Match Value Extractor and its "Preprocessing" property.
  • The Label Match Value Extractor and its "Preprocessing" property.
  • The Word Match Value Extractor and its "Preprocessing" property.
  • The Field Match Value Extractor and its "Preprocessing" property.
  • The Pattern-Based Collation Provider and its "Preprocessing Options" property.
  • The Flow Layout Provider and its "Preprocessing Options" property.
  • The following Quoting Methods using their "Preprocessing" property:
    • Extracted
    • Labeled Region
    • Semantic