2023.1:Content Model (Node Type): Difference between revisions
| Line 76: | Line 76: | ||
# Take note of the Allow Training property. | # Take note of the Allow Training property. | ||
# Extractors are a property that Grooper uses to help in identifying and classifying documents as different Document Types. | # Extractors are a property that Grooper uses to help in identifying and classifying documents as different Document Types. | ||
#* Positive | #* [[Rules-Based (Classification Method)#Positive Extractor Rules|Positive Extractor]]s tell Grooper what to look for. | ||
#**In short, wherever the Positive Extractor extracts a piece of data that Grooper is told to look for, then the document is classified as whatever document type has been configured. This is a good tool to use whenever you have documents that are similar to one another, where classification could go awry. | #**In short, wherever the Positive Extractor extracts a piece of data that Grooper is told to look for, then the document is classified as whatever document type has been configured. This is a good tool to use whenever you have documents that are similar to one another, where classification could go awry. | ||
#* | #* Similarly, [[Rules-Based (Classification Method)#Negative Extractor Rules|Negative Extractors]] tell Grooper what to exclude from being classified as a potential Document Type. | ||
| | | | ||
[[File:20231_Content_Model_A_Brief_Note_on_Document_Types_01.png]] | [[File:20231_Content_Model_A_Brief_Note_on_Document_Types_01.png]] | ||
Revision as of 08:47, 2 February 2024
|
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
About
A Content Model is the digital representation in Grooper of a document set's content. What content you want to glean from your documents is all set up within a Content Model, including the system for classifying documents and what data you want to extract from them.
Content Models are the fundamental Content Type. Other Content Types, such as Document Types, are established within a Content Model. Content Models have two main purposes in Grooper:
- Document Classification
- Data Extraction
Let's look at how Document Classification and Data Extraction can be used on a Content Model:
Document Classification is an important task that the Content Model helps facilitate.
|
|
|
|
FYI |
GPT Embeddings is a fairly new Classification Method and is still currently in beta. |
Data Extraction is another important job for a Content Model. This tells Grooper what you want done with the data from your documents, where you want it to go, and how you want it handled.
|
Brief Note on Document Types
Document Types are child objects of a Content Model. One cannot classify without a Document Type. The Classification Method on a Content Model may tell Grooper how to classify, but the Document Type tells Grooper what label to slap on the document.
|
|
|
Wrap-Up
Content Models define the classification taxonomy for a set of documents. This means a list of distinct types of documents (via Document Types), their hierarchical structure within the Content Model (via optional Content Categories). How a document is classified is defined here as well (via the Classification Method and the Document Types).
Hand-in-hand with the classification taxonomy, Content Models also define the hierarchical data structure for the documents and document set (via Data Models of the various Content Types in the Content Model). The Data Models and their Data Elements define what data is extracted from documents and how that is accomplished.




