Category:Document Modeling: Difference between revisions
Dgreenwood (talk | contribs) Created page with ""Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes: * Batch Objects - These are the components used to represent a document's structure and store raw data about them. For example, Batch Pages represent a document's individual p..." |
Dgreenwood (talk | contribs) No edit summary |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes: | "Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes: | ||
* [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure and store raw data | * [[:Category:Batch Object|Batch Objects]] - These are the components used to represent a document's structure/format and store raw data. | ||
* [[: | **For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more. | ||
* [[:Category:Data Element|Data Elements]]- | * [[:Category:Content Type|Content Types]] - These are the components used to represent how documents fit into a classification schema. | ||
**Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model. | |||
**For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper. | |||
* [[:Category:Data Element|Data Elements]] - These are the components used to represent document data (fields, sections and tables written on the document). | |||
** The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type. | |||
** For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form. | |||
Latest revision as of 15:27, 30 July 2025
"Document modeling" is the process of designing structured representations of documents to better understand them, manage them, and/or extract information from them. Many different Grooper components are used to help represent a document in different ways. This includes:
- Batch Objects - These are the components used to represent a document's structure/format and store raw data.
- For example, Batch Pages represent a document's individual pages and store the page's image, text data obtained from the Recognize activity, and more.
- Content Types - These are the components used to represent how documents fit into a classification schema.
- Content Types are used to form a classification "taxonomy". Content Models are at the top of the taxonomy. They composed of Document Types which represent different kinds of documents that all fit within the Content Model.
- For example, Document Types represent different kinds of documents in a larger Content Model. Content Types are key to dictating the Data Model used to extract data from a document and Behaviors that control processing logic for several different Activities in Grooper.
- Data Elements - These are the components used to represent document data (fields, sections and tables written on the document).
- The "Data Model" is the core Data Element. A Data Model will have one or more child Data Elements representing fields, sections and tables. Data Models collect document data during the Extract activity. What Data Model a document uses is determined by their Content Type.
- For example, Data Fields represent field-level data, such as a Social Security Number on a personnel form.
Pages in category "Document Modeling"
The following 15 pages are in this category, out of 15 total.