2.80:Paragraph Marking (Property): Difference between revisions
Dgreenwood (talk | contribs) m Dgreenwood moved page Paragraph Detection to Paragraph Marking: Renaming to the actual name of the property |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 1: | Line 1: | ||
<blockquote style="font-size:14pt"> | <blockquote style="font-size:14pt"> | ||
Paragraph Marking alters the normal text data in a document by placing the carriage return and new line feed pairs at the end of each paragraph, instead of the end of each line. This allows users to break up a document's text flow into segments of paragraphs instead of segments of lines. | |||
</blockquote> | </blockquote> | ||
The '''''Paragraph Marking''''' property is enabled in the '''''Preprocessing Options''''' of the Pattern Editor. The are several paragraph detection settings to determine what qualifies as a paragraph. | |||
== About == | |||
Paragraph Marking is part of Grooper's Natural Language Processing (NLP) solution. Normally, after text data is obtained from the [[Recognize]] activity, the carriage return and new line feed characters <code>\r\n</code> are inserted at the end of each line. For structured documents, such as forms and reports, this is extremely useful. Information is typically conveyed line by line. These characters can be helpful anchors to locate and parse data. | |||
However, unstructured documents, such as contracts and correspondence, convey information differently. Instead of information being broken up line by line, it is broken up ''paragraph by paragraph''. | |||
Revision as of 14:43, 17 April 2020
Paragraph Marking alters the normal text data in a document by placing the carriage return and new line feed pairs at the end of each paragraph, instead of the end of each line. This allows users to break up a document's text flow into segments of paragraphs instead of segments of lines.
The Paragraph Marking property is enabled in the Preprocessing Options of the Pattern Editor. The are several paragraph detection settings to determine what qualifies as a paragraph.
About
Paragraph Marking is part of Grooper's Natural Language Processing (NLP) solution. Normally, after text data is obtained from the Recognize activity, the carriage return and new line feed characters \r\n are inserted at the end of each line. For structured documents, such as forms and reports, this is extremely useful. Information is typically conveyed line by line. These characters can be helpful anchors to locate and parse data.
However, unstructured documents, such as contracts and correspondence, convey information differently. Instead of information being broken up line by line, it is broken up paragraph by paragraph.