2.80:Database Lookup (Lookup Specification)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520232.80

Database Lookup is a Lookup Specification that performs a lookup against a database Data Connection via a SQL query.

For validation, values in a Data Field are compared to corresponding values in a database table. If the results differ from what’s in the table, that field can be flagged. 

For data population, as long as the "lookup values" exist in the external database, Grooper extracted values can be used to populate additional Data Fields. Given a certain field or fields in a Data Model matches fields in a database table, additional values in the table’s row can be assigned to empty Grooper Data Fields. For example, an extracted social security number from a document could be used to lookup a corresponding name in database that has both the social security number and name.

The Grooper data fields used for comparison against a database table are called "lookup fields".  The fields populated from the database table are called "target fields".  A SQL query is used to find the lookup fields and target fields in the database table.

About

The following is a SQL query to lookup a student ID number, find that student's last name in a database table, and return it to a field in Grooper.

The WHERE clause contains the lookup field "Student ID" (@Student_ID).  The AS portion of the SELECT clause contains the target field.  The field 'LN' will be filled in using the information from the database column [Last Name].  Student_Records is the database table used.

lookup and target fields in a Grooper data model

A SQL table containing our lookup value and the target value we want to populate our data model with

border color:white
border color:white
The query then returns the targeted last name value to our LN field in the data model.

Version Differences

Prior to Version 2.80, database lookups were performed on individual Data Fields in a Data Model, using simple field mappings. Now, lookups are configured on a Data Model, Data Section or Data Table’s properties, using SQL queries. Multiple database lookups using multiple SQL queries can be written on the Data Model. They can reference any number of database columns and Grooper fields. 

During review, Grooper now provides a UI dialogue box when multiple matches are returned to choose the correct value.  The example below is using the extracted student ID numbers to fill in the rest of the table using values in a SQL database.  The number in red was found more than once in the database.



Entering the value in the lookup field will bring up a new window, showing the multiple hits from the SQL table, and allow you to pick the result.


Use Cases

Database Lookup can be used to validate data already populated in a Grooper data model. That way, you can check to see if data extracted from a document set matches what's already in an existing database.

It can also be used to populate data in a Grooper data model from a database. In these cases, some kind of extracted value from the document is used as a key to return information from a database that is not found on the document.

SQL databases and ODBC compliant databases are compatible with Grooper.

How To: Create a Data Connection

Before you begin

Before you can perform a database lookup, first Grooper must be able to connect to it in order to see everything in the data table. This is done by creating a Data Connection. Grooper can connect to a SQL database or any ODBC compliant database provider. You will also need the user name and password used to access the database.

Add a Data Connection in the Node Tree

1. Expand the "Infrastructure" folder and select the "Data Connections" folder.



2. Right click the "Data Connections" folder and select "Add" then "Data Connection...".



3. Name the Data Connection, and you will see this screen to configure the connection.



Configure the database connection

1.  Select the database "Provider".  You can connect to a SQL database selecting "SqlServer" or a Open Database Connectivity (ODBC) compliant database by selecting ODBC.



2. This example shows configuring the most common method of connecting to a database, using the "SqlServer" provider to connect to a SQL database.  Using the "Server Name" property, type the name of the SQL Server you are connecting to.



3. Select the Database Name using the dropdown menu.  The database we are connecting to for this example is named "Students".



4. You must also enter the user name and password credentials used to connect to the database.  If you leave this property blank, Windows Authentication will be used.



Verify your connection

After you press the "Save" button, you can see the database tables available in the database in the "Database Tables" pane (This example only has one, "Student_Records").  Selecting a table from the list, you can see the table's columns in the "Table Columns" pane as well as a preview of table in the "Data Preview" panel.




How To: Perform a Database Lookup

Database Lookups are used in two ways: Validation and Population.

First, they can validate data Grooper has extracted.  Using a SQL query, data fields and table columns in Grooper can check a database to verify certain fields match what is in the database table.

Second, they can populate blank data fields and table columns in a data model.  Let's say you have a set of documents containing information you know exists in a database.  The example we will look at pertains to student records.  If Grooper can extract a student's identification number from a document, and other information like their name, address, date of birth and more exists in a database, Grooper can use the known information (the student's ID number) to pull in other corresponding information from the database.  This is also done by performing a SQL query.

How To: Perform a Database Lookup for Validation

Before you begin

First, connect Grooper to the database by creating and configuring a Data Connection.  Database Lookups may be configured within a data model on one of three data element objects:  A data model, a data section, or a data table.  This example will use a simple data model with five data fields.

This example will show you some basic ways to write a SQL query to validate data from a database.  However, for more complicated cases, you may need to familiarize yourself with SQL query syntax.

For this example, we have created a data model with five fields, one for a student's ID number, their first name, their last name, middle initial and favorite animal.


Our test content model, with our data model selected. The Data Model Preview of our five fields.


Each of these fields also exist as columns in a SQL table, seen below.



Below is the document we will be validating.  First, we will verify that the Student ID exists in the table at all.  Second, we will verify Marshal's favorite animal is, indeed, the noble "bardvark".



Add a Database Lookup to the Data Model

1. Select the data model and find the "Lookups" property in its property panel under the "Behavior" category.  Press the ellipsis button at the end of the property.



2. This will bring up the "Lookup Specification Collection Editor" window.  Press the "Add" button and select "Database Lookup"



3.  This will add a new Database Lookup listed in the left panel and bring up a property panel in the right panel.  Using the property panel, select the Data Connection from the list of connections in your Infrastructure folder.  For this example, there is only one Data Connection established, "DB Lookup Example".



Validate a Single Field

1. First, we will verify the Student ID number on the document exists at all in the table.  To do this we will write a SQL query to verify the number in our data model matches a number in the SQL data table we've connected to.  Select the "SQL Query" property and press the ellipsis button at the end.



2. This will bring up the "SQL Query" editor window.  From here, you can write any SQL query to verify information Grooper extracted matches the SQL table.  To get started, start typing SELECT.  This will trigger intellisense, which will guide you to what portion of the query you can type next.  A SELECT statement will always come first (You can also manually trigger intellisense by pressing Ctrl + Shift + Space).



3.  Next, we need to define which SQL table we are referencing, using a FROM statement.  This is "FROM" where we're pulling data.  Type a space after SELECT and intellisense will show all tables in the database.  For this example, there is only one, "Student_Records".



4. This brings us right back to the SELECT statement.  Now that we've said which table we're using for the lookup ("FROM" where we're pulling data).  Typing a space, intellisense shows us all the columns we can select from the SQL table (highlighted in yellow) as well as some valid SQL query commands.  We will select or type the asterisk (*) to select from all columns in the SQL table.  We will use this to verify the field (in this case the extracted student ID) exists anywhere in the SQL table.  For validation only purposes, there are no specific requirements for the SELECT clause.  There is not a specific column we are using to populate the data model.  So, we can look at all the columns, if we choose.



5. Last, we need to configure the WHERE clause.  This is where we set the field we want to compare against a value in the SQL table.  As a general rule, the query should reference the value of at least one field in the WHERE clause.  The WHERE clause follows the FROM clause (FROM Student_Records here).  We will type WHERE [Student ID] = @Student_ID to check the extracted value in the Student ID field matches a value in the Student ID column in the SQL table.  Press "OK" to finish editing the query.

Note:  The SQL table column will always be on the left of the equals sign and the data field in Grooper will be on the right.  Intellisense will help guide you by showing a list of available columns and fields.  The data field's name must be preceded by the @ symbol.  If there is a space in the data field's name, it must be replaced with an underscore character (_).

6.  The way this query is written, no data should ever be populated.  However, as a general rule of thumb, validation queries should only flag fields and not write anything to the data model.  It is best practice to change the "Field Population" property from "Overwrite" to "None" when performing data validation only lookups to avoid accidentally altering extracted values from your documents.



7. Upon extraction, no error occurred.  The student ID extracted from the page does indeed exist in the SQL table


Extracted values Corresponding value in the SQL table, highlighted in yellow.
Had the value not been in the SQL table, the field would be flagged.


Validating a field using a pair of values

Now that we know the student ID does indeed exist in the table, we can verify if other information on that table row matches the values Grooper extracted.  This can help us answer the question, "Is Marshal's favorite animal really a bardvark?"



1. Using the previous steps, add a new database lookup and open the SQL query editor.  For this case, we will use the AND operator to validate multiple fields at once.  We can use the same query we wrote before adding "AND [Favorite Animal] = @Favorite_Animal"



In the SQL table, we see for that row, the two values do not match.  Marshal's favorite animal, as noted by the database, is the Aardvark.



Upon extraction, both fields are flagged.  Since we are using an "AND" operator, both the Student ID and Favorite Animal have to be found on the same row.



As a side note, Alexandrine's favorite animal in the SQL table is indeed the mighty "bardvark".  So, that value does exist in the database.  Note a single field query statement such as "SELECT * FROM  Student_Records WHERE [Favorite Animal] = @Favorite_Animal" would say the field is valid.  As it is written, this query is just checking to see if the value exists at all in the database.  Student ID numbers are unique.  So, it makes sense to perform validation in a way that just checks to see if they are in the SQL table or not.  However, be careful when querying non-unique data in this way to not erroneously validate Grooper's results as valid.

For this SQL query...
The extracted values check out as valid... ...even though Marshal's favorite animal is the aardvark because the value "bardvark" exists elsewhere in the table.


How To: Perform a Database Lookup for field population

Before you begin

First, connect Grooper to the database by creating and configuring a Data Connection. Database Lookups may be configured within a data model on one of three data element objects: A data model, a data section, or a data table.  This example will use a extracted values in a column from a Grooper data table to look up the remaining columns from a SQL table.

This example will show you a simple way to write a SQL query to populate columns in a Grooper data table. However, for more complicated cases, you may need to familiarize yourself with [SQL query syntax].

For our example, we have created a data table named "Lookup Table" with six data columns.

Our test model, with the Data Table selected. A preview of our empty data table before extraction.

The columns in our data table also exist in the SQL table we will connect to.



The document we will use is only a list of student ID numbers.  Using extracted values from this list and information in the SQL table, we will populate the rest of the data table in Grooper.



Add a Database Lookup to the Data Table

1. Select the Data Table in the Node Tree and find the "Lookups" property under the "Behavior" category.  Press the ellipsis button at the end of the property.



2. This will bring up the "Lookup Specification Collection Editor" window.  Press the "Add" button and select "Database Lookup"



3.  This will add a new Database Lookup listed in the left panel and bring up a property panel in the right panel.  Using the property panel, select the Data Connection from the list of connections in your Infrastructure folder.  For this example, there is only one Data Connection established, "DB Lookup Example".



Populate the remaining data columns

1. Once Grooper finds the student ID numbers on the document, a SQL query can be written to fill in the remaining columns with the corresponding data from the SQL table.  Select the "SQL Query" property and press the ellipsis button at the end.



2. This will bring up the "SQL Query" editor window.  From here, you can write a SQL query to populate our data model from data in the SQL table.  To get started, start typing SELECT.  This will trigger intellisense, which will guide you to what portion of the query you can type next.  A SELECT statement will always come first (You can also manually trigger intellisense by pressing Ctrl + Shift + Space).



3.  Next, we need to define which SQL table we are referencing, using a FROM statement.  This is "FROM" where we're pulling data.  Type a space after SELECT and intellisense will show all tables in the database.  For this example, there is only one, Student_Records.



4. This brings us right back to the SELECT statement.  Now that we've said which table we're using for the lookup ("FROM" where we're pulling data).  Typing a space, intellisense shows us all the columns we can select from the SQL table (highlighted in yellow) as well as some valid SQL query commands. 



5. We will use a series of AS statements to fill the blank columns in our table.  For each column you wish to fill, you will type the SQL column on the left and the Grooper data column on the left in single quotes.  So, to populate the data column "FN" which corresponds to the student's first name, we write [First Name] AS 'FN'.  Each new SQL column and data column AS statement is separated by a comma.  See below for the full SELECT clause.

Note: we've separated the query into three lines by pressing return or enter on the keyboard.  However, the query will work the same if it is a single line.  Separating out lines in this way is just for ease of reading the query.



6. Last, we need to configure the WHERE clause.  Here, we will reference the column containing information extracted from the document.  We can use these values as a key to find the remaining corresponding values in a row in the SQL table.  As a general rule, the query should reference the value of at least one field in the WHERE clause.  The WHERE clause follows the FROM clause (FROM Student_Records here).  Since all we have from the document are student ID numbers, we will type WHERE [Student ID] = @Student_ID pull the rest of the student's information from the SQL table.  Press "OK" to finish editing the query.

Note:  The SQL table column will always be on the left of the equals sign and the data column in Grooper will be on the right.  Intellisense will help guide you by showing a list of available columns and fields.  The Grooper data column's name must be preceded by the @ symbol.  If there is a space in the data column's name, it must be replaced with an underscore character (_).



7. Upon extraction, Grooper first pulls the student ID from the document, and fills the "Student ID" column.  Then the database is queried according to the query we wrote, and the extracted student ID value is found in the SQL table.  The empty columns in the Grooper data table are filled in with the corresponding values from the SQL table.


Property Details

Property Default Value Information
General Properties
Database Connection The connection to an external database is set here, using a Data Connection set up in the Infrastructure folder.
SQL Query The SQL query used for the database lookup is written here.
Lookup Options
Trigger Mode Auto This controls when the lookup is performed.  This can be set to "Auto", "Conditional" or "Manual"
  • Auto - The lookup is performed automatically when the value of a lookup field changes.  If a lookup field is populated automatically during Grooper's Extract activity.  This mode will also perform the lookup automatically.  Also, if you were to change the lookup field's value during Data Review, the lookup would be performed using the new value.
  • Conditional - The lookup is performed automatically only if the target fields are empty.  This can allow you to edit the value of a lookup field during Data Review without altering the values of already populated fields.
  • Manual - The lookup is only performed upon manual entry of the lookup field during Data Review.
Miss Disposition Normal This controls what happens when the lookup returns no results at all.  This can be set to "Normal", "Flag", "Clear" or "Ignore"
  • Normal - The lookup fields (the fields defined in the WHERE clause of the query) are flagged and values (if present) in the targeted fields (the fields defined by the SELECT clause of the query) are cleared.  This combines the "Flag" and "Clear" modes.
  • Flag - The lookup fields are flagged, but the targeted fields are not cleared.  The targeted fields are left alone.  If there are any extracted values present, they are left alone.
  • Clear - The target fields are cleared, but the lookup field is not flagged.
  • Ignore - Do nothing.  Nothing will indicate the lookup returned no results.  It will be as if no lookup was defined.
Conflict Disposition Normal This controls what happens when the lookup returns multiple results.  This can be set to "Normal", Flag", "Accept" or "Ignore"
  • Normal - The lookup fields (the fields defined in the WHERE clause of the query) are flagged and values (if present) in the targeted fields (the fields defined by the SELECT clause of the query) are cleared.
  • Flag - The lookup fields are flagged.  The targeted fields are left alone.  If there are any extracted values present, they are not cleared.
  • Accept - The first row in the database table containing the lookup value is used.  Even if the multiple hits have different values for other values in the row, the first row in the table is always used.
  • Ignore - Do nothing.  Nothing will indicate there were multiple results found in the database.  It will be as if no lookup was defined.
Field Population Overwrite This specifies how target fields are populated with the lookup's results.  This can be set to "Overwrite", "Supplement" or "None"
  • Overwrite - Target fields are populated with information from the database.  If these fields were populated during Extract with other information on the document, it will be replaced with what is in the database.
  • Supplement - Target fields are populated only if they are empty.  If these fields were populated during Extract with other information on the document, they will be left alone.
  • None - No fields will be populated.  This mode is used for validating extracted data against data present in the database.