Training Batch (Concept): Difference between revisions
No edit summary |
No edit summary |
||
| Line 14: | Line 14: | ||
{| | {| | ||
| style="padding:25px; vertical-align:top" | | | style="padding:25px; vertical-align:top" | | ||
Following is an example of how to perform TF-IDF classification that creates the '''Training Set''' batch. In | Following is an example of how to perform TF-IDF classification that creates the '''Training Set''' batch. In the example content model, there are five different content types from three different batches. | ||
|} | |} | ||
| Line 27: | Line 27: | ||
{| | {| | ||
| style="padding:25px; vertical-align:top" | | | style="padding:25px; vertical-align:top" | | ||
Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate Text Feature Extractor selected. In the example content model this is set to '''Words(Stemmed)''' | Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate '''Text Feature Extractor''' selected. In the example content model, this property is set to '''Words(Stemmed)''' | ||
|| [[File:Training Batch02.PNG]] | || [[File:Training Batch02.PNG]] | ||
|} | |} | ||
Revision as of 16:38, 16 April 2020
The Training Set batch is more convenient way to work with all of the samples a Content Model has been trained against
A Content Model and accompanying set of Batches can be found by following this link and downloading the provided file. It is not required to download to understand this article, but can be helpful because it can be used to follow along with the steps in this article. This file was exported from and meant for use in Grooper 2.9
About
During the development and training of Classification of a Grooper Content Model, it can be challenging to keep track of all of the samples you have trained TF-IDF against. In previous versions, each trained sample was stored under each content type in the Grooper Design Studio node tree. In 2.9, the trained samples are stored both under each content type and in the Training Set batch.
How To
|
Following is an example of how to perform TF-IDF classification that creates the Training Set batch. In the example content model, there are five different content types from three different batches. |
| ! | Some of the tabs in this tutorial are longer than the others. Please scroll to the bottom of each step's tab before going to the step. |
Prerequisites
Train Content Types
Seeing the Results
|
As you train your content types you will see a Training Set batch begin to populate under the Local Resources folder.
|
File:File:Training Batch04.PNG |
It is worth noting that one could have accomplished the above by simply making another extractor and set it up for OMR, then have the Value Extractor Data Types for each Data Field simply reference a third element. Overrides would not be necessary in that case. This example, however, sufficed to provide something to show. As with many things in Grooper there isn't always a right or wrong way. There is perhaps a best practice, and in this case, making the third extractor would be the better thing to do.
A simpler, perhaps more common, example of where Data Element Overrides very much come in handy is with the visibility of Data Elements. On of the properties of a Data Element is the Visible property which is default True. Imagine a Data Model that has five Data Fields, and the Content Model has 3 Document Types. Document1 uses Data Fields 1-3, Document2 uses Data Fields 2-4, and Document3 uses Data Fields 3-5. In Data Review you want to simplify the job for the person reviewing, so you do not want them to concern themselves with fields that are not relevant. To accomplish this you could use Data Element Overrides on each of the aforementioned hypothetical Document Types and set the Visibility property to False on all the fields you don't need. This would keep only relevant Data Fields visibile upon review.
Version Differences
Versions prior to Grooper 2.9 had an initial concept version of overrides in the Data Element Profiles tab located on the Content Model or Document Type. These profiles only allowed modification to a limited number of properties on the data element, as opposed to Grooper 2.9 where all properties can be overridden.
Where Did Zonal Properties Go?
All the zonal extraction properties are now set directly on the Data Element.