Text Preprocessor

From Grooper Wiki
Revision as of 16:12, 26 August 2025 by Dgreenwood (talk | contribs)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:


The Text Preprocessor is enabled by various configuration items in Grooper (generally by a "Preprocessing" or "Preprocessing Options" property). This includes:

  • The Ask AI Value Extractor and its "Preprocessing" property.
  • The Pattern Match Value Extractor and its "Preprocessing" property.
  • The List Match Value Extractor and its "Preprocessing" property.
  • The Label Match Value Extractor and its "Preprocessing" property.
  • The Word Match Value Extractor and its "Preprocessing" property.
  • The Field Match Value Extractor and its "Preprocessing" property.
  • The Pattern-Based Collation Provider and its "Preprocessing Options" property.
  • The Flow Layout Provider and its "Preprocessing Options" property.
  • The following Quoting Methods using their "Preprocessing" property:
    • Extracted
    • Labeled Region
    • Semantic