Rules-Based Approach: Difference between revisions

Revision as of 15:11, 6 October 2020

This approach uses Data Extractors to find key words, phrases, or other text-based information in order to identify and classify a document (assigning a Document Type to a document). For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" Document Type with this approach. One could build a Data Type extractor using regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it. Once set on the Document Type, if the extractor returned a result on a document, it would be classified as a "Purchase Report" Document Type.

These "rules" are set using the Positive Extractor and Negative Extractor properties of a Document Type object in a Content Model

Revision as of 16:06, 27 December 2019 view source Configadmin (talk \| contribs) Interface administrators, Administrators 1,305 edits Created page with "This approach uses regular expression pattern matching to find key words, phrases, or other lines of text in order to identify and classify a document...."		Revision as of 15:11, 6 October 2020 view source Dgreenwood (talk \| contribs) Bureaucrats, Administrators 17,139 edits No edit summary Newer edit →
Line 1:		Line 1:
	This approach uses [[~~Regular Expression\|regular expression~~]] ~~pattern matching~~ to find key words, phrases, or other ~~lines of~~ text in order to identify and classify a document. For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" [[Document Type~~\|type document]]~~ with this approach. One ~~would~~ build a [[Data ~~Extractor~~]] using a regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it.		This approach uses [[Data Extractor]]s to find key words, phrases, or other text-based information in order to identify and classify a document (assigning a '''[[Document Type]]''' to a document). For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" '''Document Type''' with this approach. One could build a [[Data Type]] extractor using regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it. Once set on the '''Document Type''', if the extractor returned a result on a document, it would be classified as a "Purchase Report" '''Document Type'''.

			These "rules" are set using the '''''Positive Extractor''''' and '''''Negative Extractor''''' properties of a '''Document Type''' object in a '''[[Content Model]]'''