Rules-Based Approach

From Grooper Wiki
Revision as of 15:16, 6 October 2020 by Dgreenwood (talk | contribs)

This approach uses Data Extractors to find key words, phrases, or other text-based information in order to identify and classify a document (assigning a Document Type to the Document Folder). For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" Document Type with this approach. One could build a Data Type extractor using regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it.

The "rules" are set using the Positive Extractor and Negative Extractor properties of a Document Type object in a Content Model. If an extractor set as the Positive Extractor returns a result on a document, the document would be classified as that Document Type. The Negative Extractor' works the opposite way. If the extractor finds a result on a document, it would be prevented from being classified as that Document Type.