Database Lookup
Database Lookups are used to populate or validate fields in a Data Model using an external database, such as a SQL database.
For validation, values in a Data Field are compared to corresponding values in a database table. If the results differ from what’s in the table, that field can be flagged.
For data population, as long as the "lookup values" exist in the external database, Grooper extracted values can be used to populate additional Data Fields. Given a certain field or fields in a Data Model matches fields in a database table, additional values in the table’s row can be assigned to empty Grooper Data Fields. For example, an extracted social security number from a document could be used to lookup a corresponding name in database that has both the social security number and name.
The Grooper data fields used for comparison against a database table are called "lookup fields". The fields populated from the database table are called "target fields". A SQL query is used to find the lookup fields and target fields in the database table.
About
The following is a SQL query to lookup a student ID number, find that student's last name in a database table, and return it to a field in Grooper.
The WHERE clause contains the lookup field "Student ID" (@Student_ID ). The AS portion of the SELECT clause contains the target field. The field 'LN' will be filled in using the information from the database column [Last Name]. Student_Records is the database table used. | |
lookup and target fields in a Grooper data model |
A SQL table containing our lookup value and the target value we want to populate our data model with |
The query then returns the targeted last name value to our LN field in the data model. |
Version Differences
Prior to Version 2.80, database lookups were performed on individual Data Fields in a Data Model, using simple field mappings. Now, lookups are configured on a Data Model, Data Section or Data Table’s properties, using SQL queries. Multiple database lookups using multiple SQL queries can be written on the Data Model. They can reference any number of database columns and Grooper fields.
During review, Grooper now provides a UI dialogue box when multiple matches are returned to choose the correct value. The example below is using the extracted student ID numbers to fill in the rest of the table using values in a SQL database. The number in red was found more than once in the database.
Entering the value in the lookup field will bring up a new window, showing the multiple hits from the SQL table, and allow you to pick the result.
Use Cases
Database Lookup can be used to validate data already populated in a Grooper data model. That way, you can check to see if data extracted from a document set matches what's already in an existing database.
It can also be used to populate data in a Grooper data model from a database. In these cases, some kind of extracted value from the document is used as a key to return information from a database that is not found on the document.
SQL databases and ODBC compliant databases are compatible with Grooper.
How To: Create a Data Connection
Before you begin
Before you can perform a database lookup, first Grooper must be able to connect to it in order to see everything in the data table. This is done by creating a Data Connection. Grooper can connect to a SQL database or any ODBC compliant database provider. You will also need the user name and password used to access the database.
Add a Data Connection in the Node Tree
1. Expand the "Infrastructure" folder and select the "Data Connections" folder.
2. Right click the "Data Connections" folder and select "Add" then "Data Connection...".
3. Name the Data Connection, and you will see this screen to configure the connection.
Configure the database connection
1. Select the database "Provider". You can connect to a SQL database selecting "SqlServer" or a Open Database Connectivity (ODBC) compliant database by selecting ODBC.
2. This example shows configuring the most common method of connecting to a database, using the "SqlServer" provider to connect to a SQL database. Using the "Server Name" property, type the name of the SQL Server you are connecting to.
3. Select the Database Name using the dropdown menu. The database we are connecting to for this example is named "Students".
4. You must also enter the user name and password credentials used to connect to the database. If you leave this property blank, Windows Authentication will be used.
Verify your connection
After you press the "Save" button, you can see the database tables available in the database in the "Database Tables" pane (This example only has one, "Student_Records"). Selecting a table from the list, you can see the table's columns in the "Table Columns" pane as well as a preview of table in the "Data Preview" panel.
How To: Perform a Database Lookup
Database Lookups are used in two ways: Validation and Population.
First, they can validate data Grooper has extracted. Using a SQL query, data fields and table columns in Grooper can check a database to verify certain fields match what is in the database table.
Second, they can populate blank data fields and table columns in a data model. Let's say you have a set of documents containing information you know exists in a database. The example we will look at pertains to student records. If Grooper can extract a student's identification number from a document, and other information like their name, address, date of birth and more exists in a database, Grooper can use the known information (the student's ID number) to pull in other corresponding information from the database. This is also done by performing a SQL query.
How To: Perform a Database Lookup for Validation
Before you begin
First, connect Grooper to the database by creating and configuring a Data Connection. Database Lookups may be configured within a data model on one of three data element objects: A data model, a data section, or a data table. This example will use a simple data model with five data fields.
This example will show you some basic ways to write a SQL query to validate data from a database. However, for more complicated cases, you may need to familiarize yourself with SQL query syntax.
For this example, we have created a data model with five fields, one for a student's ID number, their first name, their last name, middle initial and favorite animal.
Our test content model, with our data model selected. | The Data Model Preview of our five fields. |
Each of these fields also exist as columns in a SQL table, seen below.
Below is the document we will be validating. First, we will verify that the Student ID exists in the table at all. Second, we will verify Marshal's favorite animal is, indeed, the noble "bardvark".
Add a Database Lookup to the Data Model
1. Select the data model and find the "Lookups" property in its property panel under the "Behavior" category. Press the ellipsis button at the end of the property.
2. This will bring up the "Lookup Specification Collection Editor" window. Press the "Add" button and select "Database Lookup"
3. This will add a new Database Lookup listed in the left panel and bring up a property panel in the right panel. Using the property panel, select the Data Connection from the list of connections in your Infrastructure folder. For this example, there is only one Data Connection established, "DB Lookup Example".
Validate a Single Field
1. First, we will verify the Student ID number on the document exists at all in the table. To do this we will write a SQL query to verify the number in our data model matches a number in the SQL data table we've connected to. Select the "SQL Query" property and press the ellipsis button at the end.
2. This will bring up the "SQL Query" editor window. From here, you can write any SQL query to verify information Grooper extracted matches the SQL table. To get started, start typing SELECT
. This will trigger intellisense, which will guide you to what portion of the query you can type next. A SELECT
statement will always come first (You can also manually trigger intellisense by pressing Ctrl + Shift + Space
).
3. Next, we need to define which SQL table we are referencing, using a FROM
statement. This is "FROM
" where we're pulling data. Type a space after SELECT
and intellisense will show all tables in the database. For this example, there is only one, "Student_Records
".
4. This brings us right back to the SELECT
statement. Now that we've said which table we're using for the lookup ("FROM
" where we're pulling data). Typing a space, intellisense shows us all the columns we can select from the SQL table (highlighted in yellow) as well as some valid SQL query commands. We will select or type the asterisk (*
) to select from all columns in the SQL table. We will use this to verify the field (in this case the extracted student ID) exists anywhere in the SQL table. For validation only purposes, there are no specific requirements for the SELECT
clause. There is not a specific column we are using to populate the data model. So, we can look at all the columns, if we choose.
5. Last, we need to configure the WHERE
clause. This is where we set the field we want to compare against a value in the SQL table. As a general rule, the query should reference the value of at least one field in the WHERE
clause. The WHERE
clause follows the FROM
clause (FROM Student_Records
here). We will type WHERE [Student ID] = @Student_ID
to check the extracted value in the Student ID field matches a value in the Student ID column in the SQL table. Press "OK" to finish editing the query.
Note: The SQL table column will always be on the left of the equals sign and the data field in Grooper will be on the right. Intellisense will help guide you by showing a list of available columns and fields. The data field's name must be preceded by the @
symbol. If there is a space in the data field's name, it must be replaced with an underscore character (_
).
6. The way this query is written, no data should ever be populated. However, as a general rule of thumb, validation queries should only flag fields and not write anything to the data model. It is best practice to change the "Field Population" property from "Overwrite" to "None" when performing data validation only lookups to avoid accidentally altering extracted values from your documents.
7. Upon extraction, no error occurred. The student ID extracted from the page does indeed exist in the SQL table
Extracted values | Corresponding value in the SQL table, highlighted in yellow. |
Had the value not been in the SQL table, the field would be flagged. |
Validating a field using a pair of values
Now that we know the student ID does indeed exist in the table, we can verify if other information on that table row matches the values Grooper extracted. This can help us answer the question, "Is Marshal's favorite animal really a bardvark?"
1. Using the previous steps, add a new database lookup and open the SQL query editor. For this case, we will use the AND operator to validate multiple fields at once. We can use the same query we wrote before adding "AND [Favorite Animal] = @Favorite_Animal"
In the SQL table, we see for that row, the two values do not match. Marshal's favorite animal, as noted by the database, is the Aardvark.
Upon extraction, both fields are flagged. Since we are using an "AND
" operator, both the Student ID and Favorite Animal have to be found on the same row.
As a side note, Alexandrine's favorite animal in the SQL table is indeed the mighty "bardvark". So, that value does exist in the database. Note a single field query statement such as "SELECT * FROM Student_Records WHERE [Favorite Animal] = @Favorite_Animal
" would say the field is valid. As it is written, this query is just checking to see if the value exists at all in the database. Student ID numbers are unique. So, it makes sense to perform validation in a way that just checks to see if they are in the SQL table or not. However, be careful when querying non-unique data in this way to not erroneously validate Grooper's results as valid.
For this SQL query... | |
The extracted values check out as valid... | ...even though Marshal's favorite animal is the aardvark because the value "bardvark" exists elsewhere in the table. |
How To: Perform a Database Lookup for field population
Before you begin
First, connect Grooper to the database by creating and configuring a Data Connection. Database Lookups may be configured within a data model on one of three data element objects: A data model, a data section, or a data table. This example will use a extracted values in a column from a Grooper data table to look up the remaining columns from a SQL table.
This example will show you a simple way to write a SQL query to populate columns in a Grooper data table. However, for more complicated cases, you may need to familiarize yourself with [SQL query syntax].
For our example, we have created a data table named "Lookup Table" with six data columns.
Our test model, with the Data Table selected. | A preview of our empty data table before extraction. |
The columns in our data table also exist in the SQL table we will connect to.
The document we will use is only a list of student ID numbers. Using extracted values from this list and information in the SQL table, we will populate the rest of the data table in Grooper.
Add a Database Lookup to the Data Table
1. Select the Data Table in the Node Tree and find the "Lookups" property under the "Behavior" category. Press the ellipsis button at the end of the property.
2. This will bring up the "Lookup Specification Collection Editor" window. Press the "Add" button and select "Database Lookup"
3. This will add a new Database Lookup listed in the left panel and bring up a property panel in the right panel. Using the property panel, select the Data Connection from the list of connections in your Infrastructure folder. For this example, there is only one Data Connection established, "DB Lookup Example".
Populate the remaining data columns
1. Once Grooper finds the student ID numbers on the document, a SQL query can be written to fill in the remaining columns with the corresponding data from the SQL table. Select the "SQL Query" property and press the ellipsis button at the end.
2. This will bring up the "SQL Query" editor window. From here, you can write a SQL query to populate our data model from data in the SQL table. To get started, start typing SELECT
. This will trigger intellisense, which will guide you to what portion of the query you can type next. A SELECT
statement will always come first (You can also manually trigger intellisense by pressing Ctrl + Shift + Space
).
3. Next, we need to define which SQL table we are referencing, using a FROM
statement. This is "FROM
" where we're pulling data. Type a space after SELECT
and intellisense will show all tables in the database. For this example, there is only one, Student_Records
.
4. This brings us right back to the SELECT
statement. Now that we've said which table we're using for the lookup ("FROM
" where we're pulling data). Typing a space, intellisense shows us all the columns we can select from the SQL table (highlighted in yellow) as well as some valid SQL query commands.
5. We will use a series of AS
statements to fill the blank columns in our table. For each column you wish to fill, you will type the SQL column on the left and the Grooper data column on the left in single quotes. So, to populate the data column "FN" which corresponds to the student's first name, we write [First Name] AS 'FN'
. Each new SQL column and data column AS
statement is separated by a comma. See below for the full SELECT
clause.
Note: we've separated the query into three lines by pressing return or enter on the keyboard. However, the query will work the same if it is a single line. Separating out lines in this way is just for ease of reading the query.
6. Last, we need to configure the WHERE
clause. Here, we will reference the column containing information extracted from the document. We can use these values as a key to find the remaining corresponding values in a row in the SQL table. As a general rule, the query should reference the value of at least one field in the WHERE
clause. The WHERE
clause follows the FROM
clause (FROM Student_Records
here). Since all we have from the document are student ID numbers, we will type WHERE [Student ID] = @Student_ID
pull the rest of the student's information from the SQL table. Press "OK" to finish editing the query.
Note: The SQL table column will always be on the left of the equals sign and the data column in Grooper will be on the right. Intellisense will help guide you by showing a list of available columns and fields. The Grooper data column's name must be preceded by the @
symbol. If there is a space in the data column's name, it must be replaced with an underscore character (_
).
7. Upon extraction, Grooper first pulls the student ID from the document, and fills the "Student ID" column. Then the database is queried according to the query we wrote, and the extracted student ID value is found in the SQL table. The empty columns in the Grooper data table are filled in with the corresponding values from the SQL table.
Property Details
Property | Default Value | Information |
General Properties | ||
Database Connection | The connection to an external database is set here, using a Data Connection set up in the Infrastructure folder. | |
SQL Query | The SQL query used for the database lookup is written here. | |
Lookup Options | ||
Trigger Mode | Auto | This controls when the lookup is performed. This can be set to "Auto", "Conditional" or "Manual"
|
Miss Disposition | Normal | This controls what happens when the lookup returns no results at all. This can be set to "Normal", "Flag", "Clear" or "Ignore"
|
Conflict Disposition | Normal | This controls what happens when the lookup returns multiple results. This can be set to "Normal", Flag", "Accept" or "Ignore"
|
Field Population | Overwrite | This specifies how target fields are populated with the lookup's results. This can be set to "Overwrite", "Supplement" or "None"
|