2023:Training Batch (Concept): Difference between revisions
Created page with "{|class="wip-box" | '''WIP''' | This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |} thumb|275px|This is a snippet of the '''Grooper Design Studio UI''' showing the '''Training Set''' batch. <blockquote style="font-size:14pt"> The '''Training Set''' ba..." |
No edit summary |
||
| Line 9: | Line 9: | ||
[[File:Training | [[File:2023 Training Batch 01.png|thumb|275px|This is a snippet of the '''Grooper Design Studio UI''' showing the '''Training Set''' batch.]] | ||
<blockquote style="font-size:14pt"> | <blockquote style="font-size:14pt"> | ||
The '''Training Set''' batch is more convenient way to work with all of the samples a Content Model has been trained against | The '''Training Set''' batch is more convenient way to work with all of the samples a Content Model has been trained against. | ||
</blockquote> | </blockquote> | ||
</p><br/> | </p><br/> | ||
| Line 37: | Line 37: | ||
| style="padding:25px; vertical-align:top" | | | style="padding:25px; vertical-align:top" | | ||
Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate '''Text Feature Extractor''' selected. In the example content model, this property is set to '''Words(Stemmed)''' | Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate '''Text Feature Extractor''' selected. In the example content model, this property is set to '''Words(Stemmed)''' | ||
|| [[File:Training | || [[File:2023 Training Batch 02.png]] | ||
|} | |} | ||
</tab> | </tab> | ||
| Line 44: | Line 44: | ||
{| class="wikitable" | {| class="wikitable" | ||
| style="padding:25px; | | | style="padding:25px; | | ||
# You will need to create a '''Batch Process''' with a "Classify" '''Batch Process Step'''. | |||
# Go to the "Classification Tester" tab. | |||
# Right click on the folder you wish to train and hover over "Classification". | |||
|| [[File: | # Click on "Train As..." to train the document. | ||
|| [[File:2023 Training Batch 03.png|1000px]] | |||
|- | |- | ||
| style="padding:25px; | | | style="padding:25px; | | ||
#<li value=5> Repeat these steps for remaining '''Content Types'''. In the example '''Content Model''' provided, train all five '''Content Types''' from all three example batches | |||
</tab> | </tab> | ||
<tab name="Review the Training Set batch" style="margin:25px"> | <tab name="Review the Training Set batch" style="margin:25px"> | ||
| Line 56: | Line 57: | ||
{| | {| | ||
| style="padding:25px; vertical-align:top" | | | style="padding:25px; vertical-align:top" | | ||
As you train your content types | As you train your content types '''Form Types''' will start to appear under the '''Document Type''' in the node tree. | ||
A Grooper | A Grooper designer can review and keep track off all of the documents that have been used for '''TF-IDF'' Classification training. As the development cycle of Classification continues and more content types are training, the Grooper designer now has a single place to review, test, and perform regression testing for Classification. | ||
|| [[File:Training | || [[File:2023 Training Batch 04.png]] | ||
|} | |} | ||
</tab> | </tab> | ||
Revision as of 08:32, 18 October 2023
|
WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |

The Training Set batch is more convenient way to work with all of the samples a Content Model has been trained against.
A Content Model and accompanying set of Batches can be downloaded here. It is not required to download to understand this article, but can be helpful because it can be used to follow along with the steps in this article. This file was exported from and meant for use in Grooper 2.9
About
During the development and training of TF-IDF Classification in a Grooper Content Model, it can be challenging to keep track of all of the samples that are used during training. In previous versions, each trained sample was stored under each content type in the Grooper Design Studio node tree. In 2.9, the trained samples are stored both under each content type and in the Training Set batch.
How To
|
Following is an example of how to perform TF-IDF classification that creates the Training Set batch. In the example content model, there are five different content types from three different batches. |
| ! | Some of the tabs in this tutorial are longer than the others. Please scroll to the bottom of each step's tab before going to the step. |
Prerequisites
Train Content Types
Review the Training Set batch
It is important to understand that the Training Set is not tied to the actual TF-IDF Weightings that is associated with the Content Type or Content Category. Purging the training from a Content Model does not delete any or all of the documents in the Training Set. Conversely, deleting a document from the Training Set does not remove or purge anyTF-IDF Weightings from a Content Type or Content Category.
Version Differences
Versions prior to Grooper 2.9 do not automatically generate a Training Set batch in the Local Resources folder


