Category:Document Modeling: Difference between revisions

Latest revision as of 15:27, 30 July 2025

"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:

Batch Objects - These are the components used to represent a document's structure/format and store raw data.
- For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more.
Content Types - These are the components used to represent how documents fit into a classification schema.
- Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model.
- For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper.
Data Elements - These are the components used to represent document data (fields, sections and tables written on the document).
- The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type.
- For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form.

@@ Line 1: / Line 1: @@
 "Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:
-* [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure and store raw data about them. For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from [[Recognize]], and more.
+* [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure/format and store raw data.
-* [[:Content Type:Content Type|Contnet Types]] - These are the components used to represent how documents fit into a classification schema. They are used to form a classification "taxonomy". [[Content Model]]s are at the top of the taxonomy. They composed of [[Document Type]]s which represent different kinds of documents that all fit within the Content Model. For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and [[Behavior]]s that control processing logic for several different Activities in Grooper.
+**For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more.
-* [[:Category:Data Element|Data Elements]]-
+* [[:Category:Content Type|Content Types]] - These are the components used to represent how documents fit into a classification schema.
+**Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model.
+**For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper.
+* [[:Category:Data Element|Data Elements]] - These are the components used to represent document data (fields, sections and tables written on the document).
+** The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type.
+** For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form.

Latest revision as of 15:27, 30 July 2025

Pages in category "Document Modeling"

B

C

D

F

P