2023.1:Classification (Concept): Difference between revisions
m Added note that this activity is generally run before Extraction |
Dgreenwood (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
Classification, in Grooper, is the process of assigning a [[Content Type]] to a [[Batch Folder]]. Before classification, a [[Batch Folder]] can be seen as a "blank" document full of various [[Batch Page|pages]], but it doesn't know what kind of document it is yet. Documents are classified by: | Classification, in Grooper, is the process of assigning a '''[[Content Type]]''' (specifically a '''[[Document Type]]''' of a '''[[Content Model]]''') to a '''[[Batch Folder]]'''. Before classification, a [[Batch Folder]] can be seen as a "blank" document full of various [[Batch Page|pages]], but it doesn't know what kind of document it is yet. Documents are classified by: | ||
* | * Most often, the '''[[Classify]]''' activity using training data or rules set on a '''Content Model''' | ||
* In some cases, the '''[[Separate]]''' activity by assigning a '''Document Type''' to each new folder created | |||
* Manually assigning a | * Manually assigning a '''Document Type''' by using the "Apply Document Type" command on a '''Batch Folder'''. | ||
During the Classify activity, Grooper will use information from the | During the '''Classify''' activity, Grooper will use information from the document and its pages (generally text) and configurations from a '''Content Model''' (such as the ''''''Classification Method''''' used) to assign the document a '''Document Type''' from a '''Content Model'''. Classification is generally performed before [[Extraction]]. Until a document is classified, it has no '''Content Type''' assigned to it. It doesn't know which '''Content Model''' and corresponding '''Document Types''' and '''Data Models''' you're using to extract data. Without this information, Grooper will not understand which '''Data Elements''' to look for and the instructions to use to identify those elements within the document. |
Revision as of 10:39, 14 October 2020
Classification, in Grooper, is the process of assigning a Content Type (specifically a Document Type of a Content Model) to a Batch Folder. Before classification, a Batch Folder can be seen as a "blank" document full of various pages, but it doesn't know what kind of document it is yet. Documents are classified by:
- Most often, the Classify activity using training data or rules set on a Content Model
- In some cases, the Separate activity by assigning a Document Type to each new folder created
- Manually assigning a Document Type by using the "Apply Document Type" command on a Batch Folder.
During the Classify activity, Grooper will use information from the document and its pages (generally text) and configurations from a Content Model (such as the 'Classification Method used) to assign the document a Document Type from a Content Model. Classification is generally performed before Extraction. Until a document is classified, it has no Content Type assigned to it. It doesn't know which Content Model and corresponding Document Types and Data Models you're using to extract data. Without this information, Grooper will not understand which Data Elements to look for and the instructions to use to identify those elements within the document.