# Confidence Multiplier and Output Confidence

Some results carry more weight than others.

## Contents

## About

The * Confidence Multiplier* and

*properties of*

**Output Confidence****Data Type**and

**Data Format**extractors allow you to manually alter the confidence score of returned values.

Use of these properties is sometimes referred to as *weighted rules*. Its practical application allows a user to increase or decrease the confidence score of an extractor's result (or set its confidence to an assigned value). This changes the confidence of the extractor's results, making them appear more (or less) favorable. When used in combination with the * Order By* property set to

*Confidence*on a parent

**Data Type**, you can manipulate which child extractor's result the parent prioritizes.

### General Usage - Confidence Multiplier

Modifying the * Confidence Multiplier* property of a

**Data Type**or

**Data Format**is done by clicking on the ellipses in the

*property which opens the*

**Result Options***submenu.*

**Result Options**The * Confidence Multiplier* property defaults to

*1*and can be changed in this submenu. The field is a double and takes floating point values.

For example, a value of *0.5* will multiply the confidence of output results by 0.5. If the output confidence was 100%, now it will be 50%. Similarly, you can increase the confidence, even above 100%. If the * Confidence Multiplier* property is set to

*3*, and an output result had a 50% confidence, it will not display as 150% confidence.

### General Usage - Output Confidence

Modifying the * Output Confidence* property of a

**Data Type**or

**Data Format**is also done by clicking on the ellipses in the

*property which opens the*

**Result Options***submenu.*

**Result Options**The * Output Confidence* property defaults to

*0%*and can be changed in this submenu. The default of

*0%*will not alter the results' confidence scores. Changing this number will override whatever the result's original confidence is and replace it with this value.

For example, a value of *75%* will change the confidence of output results to 75%. If the output confidence was 100%, now it will be 75%. If the output confidence was 50%, now it will be 75%. If it was 75%, it will now be (you guessed it) 75%. It doesn't matter what the original confidence was, it will be transformed to the * Output Confidence* value.

## Use Cases

### Waterfall Extraction

*Weighted Rules* can be used in cases where one is trying to find a data element appearing on many similar types of forms but multiple extraction approaches are required to identify the element.

For Example, on different forms, the best method to pick up a piece of data may be a *Key-Value Pair*, a **Field Class**, a simple pattern match, a pattern match leveraging *FuzzyRegEx*, or some other method.

One technique for incorporating multiple extractors to return a single field value is referred to as "Waterfall Extraction". The Waterfall Extraction technique is a method to select a single result from multiple extractor results, according to some specific criteria. First, multiple extractors (and their numerous configurations) are organized under a parent **Data Type**. The extractors results are prioritized according to the * Order By* property on the parent

**Data Type**. The

*property of the parent*

**Order By****Data Type**can be set to the following:

*Position*,

*Frequency*,

*Confidence*,

*Extractor*,

*Length*,

*Value*.

Setting * Order By* to

*Confidence*can prioritize the most confident extractor result. This can prioritize "non-Fuzzy" results "Fuzzy" results or the most confident result of the child extractors leveraging

*FuzzyRegEx*. However, extractors using traditional, non-Fuzzy regex always return their results at 100%. The confidence of a returned result has, historically,

*only*been affected in one of two ways:

**Data Types**(or a child**Data Format'**s) regular expression pattern leverages*FuzzyRegEx*. Characters are mutated to match the pattern, either inserted, deleted, or swapped. Each mutation comes at the cost of the result's overall confidence, generating a result less than 100% confident.**Field Classes**, by design leverage trained/weighted features and should not return results at 100% confidence.

Considering this, a properly configured extractor can, and does, return results below 100%, and can break the logical approach of organizing results by confidence. A result returned at 90% confidence *could* be more desirable than one returned at 100%.

We will explore how and why in the How To section of this article.

### Waterfall Classification

Setting the * Classification Method* property on a

**Content Model**to

*Lexical*or

*Rules-Based*, one can set up 'Positive Extractors

*on*

**Document Types**. If this extractor returns a result above the**Minimum Similarity**

*set on the*

**Content Model**, the document will be assigned that**Document Type**during classification.This extractor could be a "Waterfall Extractor", taking advantage of the Waterfall Extraction technique. However, for classification, the system is just looking for some result to be returned above the * Minimum Similarity* confidence threshold.

In the *Waterfall Classification* method, the * Minimum Confidence* property can be set in the

**Result Filter**property window of a

**Data Type**which will eliminate any results less than that confidence. This may eliminate the results of some referenced extractors which technically matched, but at a low percent.

If we happen to know that those lower confidence hits are valid and *should* count for classifying the document, then the **Confidence Multipliers** on those referenced **Data Types** can be set to a higher value in order to make them hit the **Minimum Confidence** required.

Similarly, if higher confidence hits are inappropriately classifying documents and *shouldn't* be returned, the * Confidence Multiplier* property can be reduced so that those

**Data Types**only exceed the Minimum Confidence when they are very high confidence.

## How To

Here we'll explore a use case using a mortgage document.

### OCR Misread

In this example, an OCR error produced a misread the words “final loan” by not recognizing the space between them.

### Child Data Type Setup

Three **Data Types** were established to find variations of a result.

### Waterfall Extractor

The *Waterfall Extractor* is a **Data Type** that is a parent or references all of the unique extractors for a piece of data and then determines which one should be given as a final output to a **Data Field**.

Using **Order By** set to *Confidence* and **Direction** set to *Descending* as the sort criteria, two extractors match with the highest confidence result given first. The *FinalLoan* extractor matched because it found “finalloan” with no spaces and it is not leveraging *FuzzyRegEx*, so it matched at 100%. The *Final Loan* extractor did not match, because it is not using *FuzzyRegEx* and it did not find a space between the two words so it did not consider it a match. The *Fuzzy: Final Loan*, leveraging *FuzzyRegEx*, matched because it was able to make the word “finalloan” into “final loan” by inserting a space and so it was a 90% match.

We would like the actual correct result of *final loan* to win. There are two ways to do this. One way would be to bump up the confidence of the fuzzy regular expression **Data Type Fuzzy: Final Loan. This is done by modifying the **Confidence Multiplier

*of the*

**property in the**Result Options'**Fuzzy: Final Loan**

**Data Type**.

That works for this case, but what if there was another document where the OCR read the space between the two words correctly. In that case, the result from the **Final Loan** **Data Type** would match at 100%, and the **Fuzzy: Final Loan** **Data Type**, with the * Confidence Multiplier* property set to

*1.2*would match at 120%. While this would technically yield the correct result, it is generally best practice to have the exact match return the highest percentage. There are a couple of ways to tackle this situation. One way would be to bump up the

*property on the*

**Confidence Multiplier****Final Loan**

**Data Type**to something like

*1.3*But another way, would be to reduce the

*property on the*

**Confidence Multiplier****FinalLoan**

**Data Type**so that it returns less than 90%.

### Getting the Desired Result

Let's change some settings to set this extractor up to return the results in the desired way; that being with the most right result *weighted* the highest.

## Version Differences

Prior to **Grooper** 2.9 the * Confidence Multiplier* property did not exist.