2023.1:Visual (Classify Method): Difference between revisions

Revision as of 16:39, 8 January 2024

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Would you like to see this article expanded? Let us know at groopereducation@bisok.com.

The Visual classification method uses image data instead of text data to determine the Document Type. Instead of using text-based extractors, an IP Profile is used with an Extract Features command to obtain data pertaining to a document's image. Document samples are trained as examples of a Document Type.

For example, a common feature used is "intensity". The document is divided into cells and the percentage of black to white pixels is measured. During classification, Grooper looks at the values obtained by the IP Profile and compares them to those on the document to be classified. The document is then given a percentage similarity score to each Document Type. Whichever Document Type has the highest percentage similarity is assigned to the document. In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.

Think of a structured form, where the lines and text change very little. Therefore, if the document is divided into cells, the percentage of black pixels in that cell will be very similar from document to document.

Visual classification is unique in that it does not require OCR. It can be performed real time during scanning.

@@ Line 1: / Line 1: @@
 {{stubs}}
-<section begin="visual_classification_glossary" />
+<section begin="glossary" />
 <blockquote>
 The '''''Visual''''' classification method uses image data instead of text data to determine the '''[[Document Type]]'''.  Instead of using text-based extractors, an '''[[IP Profile]]''' is used with an '''Extract Features''' command to obtain data pertaining to a document's image.  Document samples are trained as examples of a '''Document Type'''.
 </blockquote>
-<section end="visual_classification_glossary" />
+<section end="glossary" />
 For example, a common feature used is "intensity".  The document is divided into cells and the percentage of black to white pixels is measured.  During [[Classification|classification]], Grooper looks at the values obtained by the '''IP Profile''' and compares them to those on the document to be classified.  The document is then given a percentage similarity score to each '''Document Type'''.  Whichever '''Document Type''' has the highest percentage similarity is assigned to the document.  In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.