Category:Document Modeling: Difference between revisions

From Grooper Wiki
Created page with ""Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes: * Batch Objects - These are the components used to represent a document's structure and store raw data about them. For example, Batch Pages represent a document's individual p..."
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:
"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:


* [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure and store raw data about them. For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from [[Recognize]], and more.  
* [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure/format and store raw data.
* [[:Content Type:Content Type|Contnet Types]] - These are the components used to represent how documents fit into a classification schema. They are used to form a classification "taxonomy". [[Content Model]]s are at the top of the taxonomy. They composed of [[Document Type]]s which represent different kinds of documents that all fit within the Content Model. For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and [[Behavior]]s that control processing logic for several different Activities in Grooper.
**For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more.  
* [[:Category:Data Element|Data Elements]]-
* [[:Category:Content Type|Content Types]] - These are the components used to represent how documents fit into a classification schema.  
**Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model.
**For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper.
* [[:Category:Data Element|Data Elements]] - These are the components used to represent document data (fields, sections and tables written on the document).
** The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type.
** For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form.

Latest revision as of 15:27, 30 July 2025

"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:

  • Batch Objects - These are the components used to represent a document's structure/format and store raw data.
    • For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more.
  • Content Types - These are the components used to represent how documents fit into a classification schema.
    • Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model.
    • For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper.
  • Data Elements - These are the components used to represent document data (fields, sections and tables written on the document).
    • The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type.
    • For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form.