Extract Data with AI Extract (Simple Functionality): Difference between revisions

From Grooper Wiki
 
(10 intermediate revisions by the same user not shown)
Line 21: Line 21:


By the end of this article, readers will understand how to configure an LLM provider, associate an LLM with a Data Model’s [[Fill Method]], execute AI Extract within a [[Batch Process]], and review the structured data produced by the model. This example serves as a foundational pattern that can be expanded into more advanced AI-powered document processing solutions.
By the end of this article, readers will understand how to configure an LLM provider, associate an LLM with a Data Model’s [[Fill Method]], execute AI Extract within a [[Batch Process]], and review the structured data produced by the model. This example serves as a foundational pattern that can be expanded into more advanced AI-powered document processing solutions.
Please feel free to adjust the contents of the Data Model in a way that suits your chosen document best. You can add or remove any [[Data Element]] within the provided [[Project]] you choose to see different results.


== Test using Batch Process Step Testers ==
== Test using Batch Process Step Testers ==
Line 26: Line 28:


Users create a test [[Batch]], import one or more documents, and then run each activity individually—[[Split Pages]], [[Recognize]], [[Extract]], and [[Review]]. This makes it possible to confirm that:
Users create a test [[Batch]], import one or more documents, and then run each activity individually—[[Split Pages]], [[Recognize]], [[Extract]], and [[Review]]. This makes it possible to confirm that:
* Pages are properly separated into Batch Page nodes.
* Pages are properly separated into [[Batch Page]] nodes.
* Text is successfully collected through native text extraction or OCR (using the configured Azure OCR Profile).
* Text is successfully collected through native text extraction or OCR (using the configured [[Azure OCR]] Profile).
* The correct Document Type is assigned.
* The correct Document Type is assigned.
* AI Extract executes the Data Model’s Fill Method using the selected LLM model.
* AI Extract executes the Data Model’s Fill Method using the selected LLM model.
Line 46: Line 48:
# Click the "UPLOAD" button in the "ZIP File Import" dialogue.
# Click the "UPLOAD" button in the "ZIP File Import" dialogue.
# The provided Project is now added in the Projects folder.
# The provided Project is now added in the Projects folder.
# Next, let's make sure we have a running Activity Processing service. Click the Machines folder from the Node Tree. Notice there is a running Activity Processing service for this repository. This is required if you choose to use the "Submit Job" command moving forward. It will also be required to automate processing in a Batch Process. Please refer to the [[Grooper_Command_Console_(Application)#How_To:_Grooper_services|"How to: Grooper services"]] section of the [[Grooper Command Console|"Grooper Command Console"]] article of the Grooper Wiki for more information on installing Grooper Services.
# Next, let's make sure we have a running [[Activity Processing]] service. Click the [[Machine|Machines]] folder from the Node Tree. Notice there is a running Activity Processing service for this repository. This is required if you choose to use the "Submit Job" command moving forward. It will also be required to automate processing in a Batch Process. Please refer to the [[Grooper_Command_Console_(Application)#How_To:_Grooper_services|"How to: Grooper services"]] section of the [[Grooper Command Console|"Grooper Command Console"]] article of the Grooper Wiki for more information on installing Grooper Services.
# Moving on, we'll now configure the "Azure OCR" OCR Profile that will be leveraged by the Recognize activity to add electronic text to our document. Expand the Node Tree and select the "Azure OCR" OCR Profile from the provided Project, then insert your Azure OCR API key into the API Key property.
# Moving on, we'll now configure the "Azure OCR" [[OCR Profile]] that will be leveraged by the Recognize activity to add electronic text to our document. Expand the Node Tree and select the "Azure OCR" OCR Profile from the provided Project, then insert your Azure OCR API key into the API Key property.
# Next, click the drop-down button to the right of the API Region property and select the appropriate API Region for your API key from the drop-down menu.
# Next, click the drop-down button to the right of the API Region property and select the appropriate API Region for your API key from the drop-down menu.
# Click the "Save" button to save changes made to the OCR Profile.
# Click the "Save" button to save changes made to the OCR Profile.
Line 70: Line 72:
# With no selection in the Batch Viewer, you can click the "Submit Job" button.
# With no selection in the Batch Viewer, you can click the "Submit Job" button.
# Conversely, you can select all the Batch Pages, then click the "Test Activity" button. Once completed, each Batch Page will have a "CharacterData.txt" file associated with them that contains the now recognized electronic text. Recognize selectively performs OCR for image-based content (using the "Azure OCR" OCR Profile in the Project) and native text extraction for encoded text content.
# Conversely, you can select all the Batch Pages, then click the "Test Activity" button. Once completed, each Batch Page will have a "CharacterData.txt" file associated with them that contains the now recognized electronic text. Recognize selectively performs OCR for image-based content (using the "Azure OCR" OCR Profile in the Project) and native text extraction for encoded text content.
# Moving on, we'll now test extraction. Select the "Extract" Batch Process Step. Notice the Default Content Type property is set to the "Generic Document" Document Type.
# Moving on, we'll now test extraction. Select the [[Extract|"Extract"]] Batch Process Step. Notice the Default Content Type property is set to the "Generic Document" Document Type.
# Click the Activity Tester tab.
# Click the Activity Tester tab.
# With no selection in the Batch Viewer, you can click the "Submit Job" button.
# With no selection in the Batch Viewer, you can click the "Submit Job" button.
Line 88: Line 90:


This approach highlights the production experience. It reflects how a deployed AI Extract solution behaves in real-world use: users upload documents, processing runs in the background, and structured data is presented for validation. The complexity of OCR, LLM integration, and Data Model execution is encapsulated within the Batch Process, providing a simple and accessible workflow for end users.
This approach highlights the production experience. It reflects how a deployed AI Extract solution behaves in real-world use: users upload documents, processing runs in the background, and structured data is presented for validation. The complexity of OCR, LLM integration, and Data Model execution is encapsulated within the Batch Process, providing a simple and accessible workflow for end users.
# First we need to publish our Batch Process. Expand the Node Tree and select the "Generic Document Extraction" Batch Process from the provided Project, then click the "Publish" button to publish this Batch Process to the Processes folder.
# Click the "Execute" button in the "Publish" dialogue.
# A published version of the Batch Process now exists in the Processes folder. This Batch Process will now be available to select for production Batches.
# Next, we'll use the "Upload Documents" button from the Batches Page to create a new Batch and leverage our published Batch Process to automate the processing. Click the "Batches Page" button to go to the Batches Page.
# Click the "Upload Documents" button  in the upper right of the Batches Page. This allows us to quickly create and begin processing a small, ad hoc, Batch.
# Click the "Choose Files" button in the dialogue that appears.
# Select the provided PDF document from the Explorer window that opens, then click the "Open" button. You may use your own document for this as well.
# Set the Process property to the published "Generic Document Extraction" Batch Process, then click the "OK" button.
# Select the newly imported Batch. If the Batch started paused, you may need to click the "Resume" button to begin processing.
# On the Jobs tab of the Batch Info Viewer, you will see bars indicating the status of the processing of the Batch Process Steps as the running Activity Processing service completes the Tasks of each Job.
# Once processing is complete, double click the Batch.
# The human attended Review activity will open with the Data Viewer to allow review of the extracted data.If you're happy with what you see, you can click the "Complete Task" button to confirm the collected data and finish the Batch Process.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.78; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmm3ye0a3000uxp0ivk5t2f7y?embed_v=2&utm_source=embed" loading="lazy" title="Test using "Upload Documents" Button" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>


== For more information ==
== For more information ==
Line 101: Line 118:
* [[Batches Page]]
* [[Batches Page]]
* [[Content Model]]
* [[Content Model]]
* [[Data Element]]
* [[Data Model]]
* [[Data Model]]
* [[Design Page]]
* [[Design Page]]
Line 110: Line 128:
* [[Machine]]
* [[Machine]]
* [[OCR Profile]]
* [[OCR Profile]]
* [[Project]]
* [[Recognize]]
* [[Recognize]]
* [[Review]]
* [[Review]]
* [[Split Pages]]
* [[Split Pages]]

Latest revision as of 11:00, 27 February 2026

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

You may download the ZIP file for use in your own Grooper environment (version 2025). There is a Project ZIP file.

While not strictly necessary for this article, as it's intended to work with most documents, you may choose to download this PDF to follow along.

Introduction

This article demonstrates how to use Grooper’s AI Extract capability to automatically analyze documents and populate a Data Model using a Large Language Model (LLM). This article focuses on the core mechanics of AI-based extraction—showing how Grooper combines OCR, Content Models, Data Models, and an LLM Connector to collect structured data from unstructured documents.

The intention of this article is to provide a minimal, easy-to-understand example of AI Extract in action. The included “Generic Document” configuration uses broadly applicable fields such as document ID, document date, and party information so it can work with almost any document type. The goal is not to model a specific form, but to illustrate how AI Extract interprets document content and fills Data Fields dynamically. Readers are encouraged to modify or replace these generic fields to match their own document requirements.

This guide also demonstrates two execution methods within Grooper:

  • Using Batch Process Step Testers from the Design page to observe and validate each stage of processing, including OCR, Document Type assignment, and AI-driven extraction.
  • Using the "Upload Documents" button on the Batches Page to experience how the same process runs in a streamlined, production-style workflow.

By the end of this article, readers will understand how to configure an LLM provider, associate an LLM with a Data Model’s Fill Method, execute AI Extract within a Batch Process, and review the structured data produced by the model. This example serves as a foundational pattern that can be expanded into more advanced AI-powered document processing solutions.

Please feel free to adjust the contents of the Data Model in a way that suits your chosen document best. You can add or remove any Data Element within the provided Project you choose to see different results.

Test using Batch Process Step Testers

This portion of the article demonstrates how to execute the "Generic Document Extraction" Batch Process manually from the Design Page using Grooper’s Activity Tester tabs of Batch Process Steps. This approach is intended for development, validation, and learning. It allows you to observe how each processing stage contributes to AI-driven data extraction.

Users create a test Batch, import one or more documents, and then run each activity individually—Split Pages, Recognize, Extract, and Review. This makes it possible to confirm that:

  • Pages are properly separated into Batch Page nodes.
  • Text is successfully collected through native text extraction or OCR (using the configured Azure OCR Profile).
  • The correct Document Type is assigned.
  • AI Extract executes the Data Model’s Fill Method using the selected LLM model.
  • Structured field values are written to the Data Model and can be inspected in the Data Viewer of the Review activity.

This method emphasizes visibility and diagnostic control. It is especially useful for verifying LLM connectivity, testing prompt behavior, confirming field population logic, and troubleshooting extraction results before publishing the process for broader use.

  1. First, let's establish the LLM Connector repository option. Select the Root node, then click the ellipsis button to the right of the Options property to open the Options editor.
  2. In the "Options" window, click the "Add" button, then select "LLM Connector" from the drop-down menu.
  3. An "LLM Connector" will be added to the collection. Click the ellipsis button to the right of the Service Providers property to open the Service Providers editor.
  4. In the "Service Providers" window, click the "Add" button, then select an LLM Provider from the drop-down menu. The "GCS Provider" option will be chosen in this case.
  5. The chosen LLM Provider will be added to the collection. Depending on which LLM Provider you choose, you may need to do more configuration. Once configured, click the "OK" button in the "Service Providers" window.
  6. Click the "OK" button from the "Options" window.
  7. Click the "Save" button to save the changes made to the Root node.
  8. Next, let's import the Project ZIP file provided for this exercise. Select the Projects folder from the Node Tree, then click the "Upload ZIP" button.
  9. In the "ZIP File Import" dialogue that opens, click the "Choose File" button.
  10. An Explorer window will open allowing you to select the provided Project ZIP file. Click the "Open" button.
  11. Click the "UPLOAD" button in the "ZIP File Import" dialogue.
  12. The provided Project is now added in the Projects folder.
  13. Next, let's make sure we have a running Activity Processing service. Click the Machines folder from the Node Tree. Notice there is a running Activity Processing service for this repository. This is required if you choose to use the "Submit Job" command moving forward. It will also be required to automate processing in a Batch Process. Please refer to the "How to: Grooper services" section of the "Grooper Command Console" article of the Grooper Wiki for more information on installing Grooper Services.
  14. Moving on, we'll now configure the "Azure OCR" OCR Profile that will be leveraged by the Recognize activity to add electronic text to our document. Expand the Node Tree and select the "Azure OCR" OCR Profile from the provided Project, then insert your Azure OCR API key into the API Key property.
  15. Next, click the drop-down button to the right of the API Region property and select the appropriate API Region for your API key from the drop-down menu.
  16. Click the "Save" button to save changes made to the OCR Profile.
  17. Next we need to configure the AI Extract Fill Method. Expand the Node Tree, then select the Data Model from the provided Project. Click the ellipsis button to the right of the Fill Methods property to open the Fill Methods editor.
  18. In the "Fill Methods" window, click the "Add" button, then select "AI Extract" from the drop-down menu.
  19. "AI Extract" will be added to the collection. Expand the Generator sub-properties then click the ellipsis button to the right of the Model property to open the Model selector.
  20. In the "Model" window, select a model, then click the "OK" button.
  21. Click the "OK" button from the "Fill Methods" window.
  22. Click the "Save" button to save the changes made to the Data Model.
  23. Now, let's add a test Batch. Expand the Node Tree and right-click the Batches Test folder, then select "Add Batch" from the pop-out menu.
  24. In the "Add" dialogue, provide a name for the Batch in the Name property, then click the "Execute" button. Here we've named it "A.I. Extract Test".
  25. Select the newly created Batch from the Node Tree, then click the Viewer tab.
  26. From an Explorer window, drag the provided PDF onto the root Batch Folder of the Batch. Though a document was provided with this example, you can try to use any document you'd like to test.
  27. Now we will begin to test the Batch Process Steps of our Batch Process to affect the contents of our Batch, starting with "Split Pages". Expand the Node Tree and select the "Split Pages" Batch Process Step from the provided Project, then click the Activity Tester tab.
  28. Click the "Select Batch" button in the Batch Viewer, then be sure to select the newly created Batch, and click the "OK" button.
  29. Select the Folder Level 1 Batch Folder from the Batch Viewer, then click the "Test Activity" button. You must select the appropriate scope, as configured on the Batch Process Step's properties to use this command. This will use local system resources to process the activity one task at a time. In this case, there's one Folder Level 1 Batch Folder, so it's one task.
  30. Using the "Submit Job" button is also an option. You do not need to make a selection in the Batch Viewer to use this command. This will create a Job with a number of tasks from the configured scope. Once again, in this case, one Folder Level 1 Batch Folder will create one Task in the Job. The task, or tasks, of the Job will be picked up by an active Activity Processing service to be processed.
  31. Once completed, a number of child Batch Page objects will be created by the Split Pages activity. You can view these in the Batch Viewer.
  32. We can also see the Batch Pages as nodes in the Node Tree. Expand the Node Tree and select the "A.I Extract Test" Batch, then click the "Refresh" button.
  33. Expand the contents of the Batch to see the Batch Page object nodes in the Node Tree.
  34. Next we'll test the Recognize activity. Select the "Recognize" Batch Process Step, then click the Activity Tester tab.
  35. With no selection in the Batch Viewer, you can click the "Submit Job" button.
  36. Conversely, you can select all the Batch Pages, then click the "Test Activity" button. Once completed, each Batch Page will have a "CharacterData.txt" file associated with them that contains the now recognized electronic text. Recognize selectively performs OCR for image-based content (using the "Azure OCR" OCR Profile in the Project) and native text extraction for encoded text content.
  37. Moving on, we'll now test extraction. Select the "Extract" Batch Process Step. Notice the Default Content Type property is set to the "Generic Document" Document Type.
  38. Click the Activity Tester tab.
  39. With no selection in the Batch Viewer, you can click the "Submit Job" button.
  40. Conversely, you can select all the Folder Level 1 Batch Folders, then click the "Test Activity" button.
  41. This will assign each Batch Folder the "Generic Document" Document Type for its "Default Content Type" and execute its Data Model's extraction instructions. In this case, that means AI Extract will run on each document to collect the data.
  42. Finally, we'll review the data. Select the "Review" Batch Process Step. Notice the Views property has a "Data View" added.
  43. Click the Activity Tester tab.
  44. Select the Batch root, then click the "Test Activity" button.
  45. This will open the Grooper Review screen. Use the Data Viewer to inspect the data collected by AI Extract.

Test with the "Upload Documents" button

This portion of the article demonstrates running the same AI Extract workflow from the Batches page using the "Upload Documents" button. After the Batch Process is published, users can upload documents and select the "Generic Document Extraction" process from a dropdown list.

Once started, the Batch executes automatically through Split Pages, Recognize, and Extract, pausing at Review. The Activity Processing service handles each step without requiring manual interaction. Users then complete the Review step and inspect the extracted values using the Data Viewer.

This approach highlights the production experience. It reflects how a deployed AI Extract solution behaves in real-world use: users upload documents, processing runs in the background, and structured data is presented for validation. The complexity of OCR, LLM integration, and Data Model execution is encapsulated within the Batch Process, providing a simple and accessible workflow for end users.

  1. First we need to publish our Batch Process. Expand the Node Tree and select the "Generic Document Extraction" Batch Process from the provided Project, then click the "Publish" button to publish this Batch Process to the Processes folder.
  2. Click the "Execute" button in the "Publish" dialogue.
  3. A published version of the Batch Process now exists in the Processes folder. This Batch Process will now be available to select for production Batches.
  4. Next, we'll use the "Upload Documents" button from the Batches Page to create a new Batch and leverage our published Batch Process to automate the processing. Click the "Batches Page" button to go to the Batches Page.
  5. Click the "Upload Documents" button in the upper right of the Batches Page. This allows us to quickly create and begin processing a small, ad hoc, Batch.
  6. Click the "Choose Files" button in the dialogue that appears.
  7. Select the provided PDF document from the Explorer window that opens, then click the "Open" button. You may use your own document for this as well.
  8. Set the Process property to the published "Generic Document Extraction" Batch Process, then click the "OK" button.
  9. Select the newly imported Batch. If the Batch started paused, you may need to click the "Resume" button to begin processing.
  10. On the Jobs tab of the Batch Info Viewer, you will see bars indicating the status of the processing of the Batch Process Steps as the running Activity Processing service completes the Tasks of each Job.
  11. Once processing is complete, double click the Batch.
  12. The human attended Review activity will open with the Data Viewer to allow review of the extracted data.If you're happy with what you see, you can click the "Complete Task" button to confirm the collected data and finish the Batch Process.

For more information

Please review the following articles for more information on these specific topics: