2023.1:Classification (Concept): Difference between revisions

Revision as of 10:39, 14 October 2020

Classification, in Grooper, is the process of assigning a Content Type (specifically a Document Type of a Content Model) to a Batch Folder. Before classification, a Batch Folder can be seen as a "blank" document full of various pages, but it doesn't know what kind of document it is yet. Documents are classified by:

Most often, the Classify activity using training data or rules set on a Content Model
In some cases, the Separate activity by assigning a Document Type to each new folder created
Manually assigning a Document Type by using the "Apply Document Type" command on a Batch Folder.

During the Classify activity, Grooper will use information from the document and its pages (generally text) and configurations from a Content Model (such as the 'Classification Method used) to assign the document a Document Type from a Content Model. Classification is generally performed before Extraction. Until a document is classified, it has no Content Type assigned to it. It doesn't know which Content Model and corresponding Document Types and Data Models you're using to extract data. Without this information, Grooper will not understand which Data Elements to look for and the instructions to use to identify those elements within the document.

@@ Line 1: / Line 1: @@
-Classification, in Grooper, is the process of assigning a [[Content Type]] to a [[Batch Folder]].  Before classification, a [[Batch Folder]] can be seen as a "blank" document full of various [[Batch Page|pages]], but it doesn't know what kind of document it is yet.  Documents are classified by:
+Classification, in Grooper, is the process of assigning a '''[[Content Type]]''' (specifically a '''[[Document Type]]''' of a '''[[Content Model]]''') to a '''[[Batch Folder]]'''.  Before classification, a [[Batch Folder]] can be seen as a "blank" document full of various [[Batch Page|pages]], but it doesn't know what kind of document it is yet.  Documents are classified by:
-* The Separate activity by assigning a [[Content Type]] to each new folder created
+* Most often, the '''[[Classify]]''' activity using training data or rules set on a '''Content Model'''
-* The Classify activity using logic set on a [[Content Model]] or
+* In some cases, the '''[[Separate]]''' activity by assigning a '''Document Type''' to each new folder created
-* Manually assigning a [[Document Type]] by using the "Apply Document Type" command on a [[Batch Folder]].
+* Manually assigning a '''Document Type''' by using the "Apply Document Type" command on a '''Batch Folder'''.
-During the Classify activity, Grooper will use information from the [[Batch Page|pages]] in the Batch Folder (generally text) and configurations from a [[Content Model]] to give it a [[Document Type]] from your [[Content Model]]. This activity is generally performed before [[Extraction]], because until a document is classified, Grooper will not understand which Data Elements to look for and the instructions to use to identify those elements within the document.
+During the '''Classify''' activity, Grooper will use information from the document and its pages (generally text) and configurations from a '''Content Model''' (such as the ''''''Classification Method''''' used) to assign the document a '''Document Type''' from a '''Content Model'''. Classification is generally performed before [[Extraction]].  Until a document is classified, it has no '''Content Type''' assigned to it.  It doesn't know which '''Content Model''' and corresponding '''Document Types''' and '''Data Models''' you're using to extract data.  Without this information, Grooper will not understand which '''Data Elements''' to look for and the instructions to use to identify those elements within the document.