Rules-Based Approach

This approach uses regular expression pattern matching to find key words, phrases, or other lines of text in order to identify and classify a document. For example, a document with a centered header of "Purchase Report" might be classified as a "Purchase Report" type document with this approach. One would build a Data Extractor using a regular expression to match the phrase "Purchase Report" centered at the top of a document to identify it.