2023.1:Visual (Classify Method)
|
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
The Visual classification method uses image data instead of text data to determine the Document Type. Instead of using text-based extractors, an IP Profile is used with an Extract Features command to obtain data pertaining to a document's image. Document samples are trained as examples of a Document Type.
About
Similar to Lexical Classification, Visual Classification relies on a training-based approach. Where Grooper is trained to classify specific documents based off of examples given. The difference is, that there is no text involved. Instead, Visual Classification relies on how the document looks. Specifically, Visual Classification looks at pixel intensity across the document, and classifies accordingly. Pixel intensity refers to how dark a pixel is.
How To: Set Up Visual Classification
The Classification Process
What Is Pixel Intensity?
For example, a common feature used is "intensity". The document is divided into cells and the percentage of black to white pixels is measured. During classification, Grooper looks at the values obtained by the IP Profile and compares them to those on the document to be classified. The document is then given a percentage similarity score to each Document Type. Whichever Document Type has the highest percentage similarity is assigned to the document. In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.
Think of a structured form, where the lines and text change very little. Therefore, if the document is divided into cells, the percentage of black pixels in that cell will be very similar from document to document.
Visual classification is unique in that it does not require OCR. It can be performed real time during scanning.