Rules-Based Approach: Difference between revisions

From Grooper Wiki
Created page with "This approach uses regular expression pattern matching to find key words, phrases, or other lines of text in order to identify and classify a document...."
 
No edit summary
Line 1: Line 1:
This approach uses [[Regular Expression|regular expression]] pattern matching to find key words, phrases, or other lines of text in order to identify and classify a document.  For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" [[Document Type|type document]] with this approach.  One would build a [[Data Extractor]] using a regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it.
This approach uses [[Data Extractor]]s to find key words, phrases, or other text-based information in order to identify and classify a document (assigning a '''[[Document Type]]''' to a document).  For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" '''Document Type''' with this approach.  One could build a [[Data Type]] extractor using regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it. Once set on the '''Document Type''', if the extractor returned a result on a document, it would be classified as a "Purchase Report" '''Document Type'''.
 
These "rules" are set using the '''''Positive Extractor''''' and '''''Negative Extractor''''' properties of a '''Document Type''' object in a '''[[Content Model]]'''

Revision as of 15:11, 6 October 2020

This approach uses Data Extractors to find key words, phrases, or other text-based information in order to identify and classify a document (assigning a Document Type to a document). For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" Document Type with this approach. One could build a Data Type extractor using regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it. Once set on the Document Type, if the extractor returned a result on a document, it would be classified as a "Purchase Report" Document Type.

These "rules" are set using the Positive Extractor and Negative Extractor properties of a Document Type object in a Content Model