AND (Collation Provider)

From Grooper Wiki
(Redirected from AND Collation Provider)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

The AND Collation Provider is a type of Collation Provider designed to return a result if and only if each of its child extractors returns a result. As long as each child type extractor returns a result, then these individual results will be passed up to the parent AND Collation Provider and returned as one combined value, as seen below.



The AND Collation Provider can be configured to return a result if only a certain number of child extractors return a result. This will be detailed in an example below. Suffice it to say, that even if two out of three child extractors return a result, if configured properly, the AND Collation Provider will still display results.

Setting Up the Collation Provider

Setting up for Data Extraction

To begin, you must set the Collation Provider on the Data Type.

  1. To do so, select a Data Type in the Node Tree.
  2. Under General properties, select Collation and select the hamburger icon at the far-right of the property to expand the drop-down menu.
  3. Select AND.



Once that's done, be sure to add child extractors. These can be Data Types as well, or Value Readers. Just so long as they return a result that can be passed up to the parent Data Type.

Minimum Hits

Be aware of a property called "Minimum Hits" that can be found when expanding the AND Collation Provider. Normally, this property is defaulted to zero. This ensures that all extractors must produce a hit in order to get a positive result. If the number of Minimum Hits is changed to, let's say 2, then all exactors need to meet at least two of the whatever many criteria are needed to produce a result. Be cautious, as this changing the Minimum Hits property could skew results. This is illustrated below.

  1. Here, we have a Data Type parent Object that has three Value Reader children. Using these three children, we are trying to determine which of the documents in our example contains identification information pertaining to ducks.
  2. To be specific, we're looking for identification information from three categories: appearance, locomotion, and whether or not what is described in the document quacks.
  3. With the Minimum Hits property set to 0, Document 4, the platypus document, does not match all the criteria necessary for identifying a duck according to what we've told Grooper. Therefore, the data is not extracted from that document.



  1. However, when the Minimum Hits property is adjusted, this tells Grooper that, as long as two of the three criteria are present, then the document has duck data, and said data is to be extracted.

Setting up for Classification

How does Classification play into the AND Collation Provider? Since the AND Collation relies on positive hits to extract data, you can make use of it through the Positive Extractor property on a Document Type. Simply use your configured Collation Provider as a referenced extractor for your Positive Extractor.

  1. To do this, simply select the Document Type you wish to configure.
  2. Then, reference the Data Type that uses the AND Collation provider on the Positive Extractor property.



  1. With the Data Type referenced on the Document Type, go to the Classify Batch Process Step to observe how the AND Collation Provider, when properly configured, can be used for classification.
  2. Remain on the Batch Process step.
  3. Select the proper Content Model Scope; for this example, we would use the Content Model where we configured the Data Type with its AND Collation Provider.



  1. Go to the Activity Tester tab. Once all the documents are selected, press the play button.
  2. And voila! By referencing the Data Type on the Document Type, these documents have now been properly classified.

What Does This Mean for Classification?

So, how exactly can an AND Collation Provider assist with Classification? Simply put, it is a tool that can be referenced on a Positive Extractor to help Grooper identify certain Document Types.