2023.1:Waterfall Classification (Concept): Difference between revisions

Revision as of 16:16, 9 April 2024

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Waterfall Classification is a classification concept in Grooper that manipulates the Positive Extractor property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever Documents get misclassified, and simply retraining won't help.

ABOUT

Normally with classification, one can train a Document, set up a Positive Extractor for maximum accuracy, classify, get good results, and call it done. But what happens when high accuracy and specificity do more harm than good? For example, what if, due to the type of Positive Extractor being used, one Document gets erroneously classified as the wrong Document Type? You could always just make changes to the extractor, but who knows how long that would take, as well as what other problems that could create for classification. Instead, you can have your Extractor act as a safety net that classifies your the extracted data in a more general manner. This is the concept known as Waterfall Classification; manipulating the Positive Extractor along the downward curve of the waterfall away from high specificity and accuracy towards something somewhat more generic. Not completely generic, just training a similarity to where it can get the results we want for classification.

@@ Line 13: / Line 13: @@
 == ABOUT ==
 Normally with classification, one can train a Document, set up a Positive Extractor for maximum accuracy, classify, get good results, and call it done. But what happens when high accuracy and specificity do  more harm than good? For example, what if, due to the type of Positive Extractor being used, one Document gets erroneously classified as the wrong Document Type? You could always just make changes to the extractor, but who knows how long that would take, as well as what other problems that could create for classification. Instead, you can
-have your Extractor act as a safety net that classifies your the extracted data in a more general manner. This is the concept known as Waterfall Classification; the combination
+have your Extractor act as a safety net that classifies your the extracted data in a more general manner. This is the concept known as Waterfall Classification; manipulating the Positive Extractor along the downward curve of the waterfall away from high specificity and accuracy towards something somewhat more generic. Not completely generic, just training a similarity to where it can get the results we want for classification.
 == STARTING THE WATERFALL ==