2023.1:Visual (Classify Method): Difference between revisions

Revision as of 11:04, 25 November 2020

The Visual classification method uses image data instead of text data to determine the Document Type. Instead of using text extractors, an IP Profile is used with an Extract Features command to obtain data pertaining to a document's image. Document samples are trained as examples of a Document Type.

For example, a common feature used is "intensity". The document is divided into cells and the percentage of black to white pixels is measured. During classification, Grooper looks at the values obtained by the IP Profile and compares them to those on the document to be classified. The document is then given a percentage similarity score to each Document Type. Whichever Document Type has the highest percentage similarity is assigned to the document. In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.

Think of a structured form, where the lines and text change very little. Therefore, if the document is divided into cells, the percentage of black pixels in that cell will be very similar from document to document.

Visual classification is unique in that it does not require OCR. It can be performed real time during scanning.

@@ Line 1: / Line 1: @@
-The ''Visual'' classification method uses image data instead of text data to determine the '''[[Document Type]]'''.  Instead of using text [[Data Extractor|extractors]], an '''[[IP Profile]]''' will be set with an '''Extract Features''' command to get data pertaining to a document's image.  Document samples are trained as examples of a '''Document Type'''.
+<onlyinclude>
+<blockquote style="font-size:14pt">
+The ''Visual'' classification method uses image data instead of text data to determine the '''[[Document Type]]'''.  Instead of using text [[Data Extractor|extractors]], an '''[[IP Profile]]''' is used with an '''Extract Features''' command to obtain data pertaining to a document's image.  Document samples are trained as examples of a '''Document Type'''.
+</blockquote>
 For example, a common feature used is "intensity".  The document is divided into cells and the percentage of black to white pixels is measured.  During [[Classification|classification]], Grooper looks at the values obtained by the '''IP Profile''' and compares them to those on the document to be classified.  The document is then given a percentage similarity score to each '''Document Type'''.  Whichever '''Document Type''' has the highest percentage similarity is assigned to the document.  In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.
@@ Line 6: / Line 9: @@
 ''Visual'' classification is unique in that it does not require [[OCR]].  It can be performed real time during scanning.
+</onlyinclude>