2023:Training Batch (Concept): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Tag: Manual revert
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{AutoVersion}}


[[File:2023 Training Batch 01.png|thumb|275px|This is a snippet of the '''Grooper Design Studio UI''' showing the '''Training Set''' batch.]]
[[File:2023 Training Batch 01.png|thumb|275px|This is a snippet of the '''Grooper Design Studio UI''' showing the '''Training Set''' batch.]]


<blockquote style="font-size:14pt">
<blockquote>{{#lst:Glossary|Training Batch}}</blockquote>
The '''Training Set''' batch is more convenient way to work with all of the samples a Content Model has been trained against.
 
</blockquote>
{|class="download-box"
</p><br/>
|
A '''Content Model''' and accompanying set of '''Batches''' can be downloaded '''[[:Media:Training_Batch_Example.zip|here.]]'''  It is not required to download to understand this article, but can be helpful because it can be used to follow along with the steps in this article. ''This file was exported from and meant for use in Grooper 2.9''
[[File:Asset 22@4x.png]]
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023).  The first contains a '''Project''' with resources used in examples throughout this article. The second contains one or more '''Batches''' of sample documents.
* [[Media:2023 Wiki Training-Batch Project.zip]]
* [[Media:2023 Wiki Training-Batch Batch.zip]]
|}


==About==
==About==
Line 15: Line 21:


==How To==
==How To==
{|
{|
| style="padding:25px; vertical-align:top" |
| style="padding:25px; vertical-align:top" |
Line 20: Line 27:
|}
|}


{|cellpadding="10" cellspacing="5"
{|class="attn-box"
|-style="background-color:#f89420; color:white"
|
|style="font-size:14pt"|'''!'''||Some of the tabs in this tutorial are longer than the others.  Please scroll to the bottom of each step's tab before going to the step.
&#9888;
|
Some of the tabs in this tutorial are longer than the others.  Please scroll to the bottom of each step's tab before going to the step.
|}
|}


Line 28: Line 37:
<tab name="Prerequisites" style="margin:25px">
<tab name="Prerequisites" style="margin:25px">
====Prerequisites====
====Prerequisites====
{|
{|cellpadding=10 cellspacing=5
| style="padding:25px; vertical-align:top" |
|valign=top style="width:40%"|
Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate '''Text Feature Extractor''' selected.  In the example content model, this property is set to '''Words(Stemmed)'''
Following these steps assumes you already have a content model created up with '''Lexical''' set as the '''Classification Method''' and the appropriate '''Text Feature Extractor''' selected.  In the example content model, this property is set to '''Words(Stemmed)'''
|| [[File:2023 Training Batch 02.png]]
|| [[File:2023 Training Batch 02.png]]
Line 36: Line 45:
<tab name="Train Content Types" style="margin:25px">
<tab name="Train Content Types" style="margin:25px">
====Train Content Types====
====Train Content Types====
{| class="wikitable"
{|cellpadding=10 cellspacing=5
| style="padding:25px; |
|valign=top style="width:40%"|
# You will need to create a '''Batch Process''' with a "Classify" '''Batch Process Step'''.  
# You will need to create a '''Batch Process''' with a "Classify" '''Batch Process Step'''.  
# Go to the "Classification Tester" tab.
# Go to the "Classification Tester" tab.
# Right click on the folder you wish to train and hover over "Classification".  
# Right click on the folder you wish to train and hover over "Classification".  
# Click on "Train As..." to train the document.
# Click on "Train As..." to train the document.
|| [[File:2023 Training Batch 03.png|1000px]]
 
|-
Repeat these steps for remaining '''Content Types'''.  In the example '''Content Model''' provided, train all five '''Content Types''' from all three example batches
| style="padding:25px; |
|
#<li value=5> Repeat these steps for remaining '''Content Types'''.  In the example '''Content Model''' provided, train all five '''Content Types''' from all three example batches
[[File:2023 Training Batch 03.png]]
|}
</tab>
</tab>
<tab name="Review the Training Set batch" style="margin:25px">
<tab name="Review the Training Set batch" style="margin:25px">
====Review the Training Set batch====
====Review the Training Set batch====
{|
{|cellpadding=10 cellspacing=5
| style="padding:25px; vertical-align:top" |
|valign=top style="width:40%"|
As you train your content types '''Form Types''' will start to appear under the '''Document Type''' in the node tree.  
As you train your content types you will see a '''Training Set''' batch begin to populate under the '''Local Resources''' folder.<br/>
A Grooper designer can review and keep track off all of the documents that have been used for '''TF-IDF'' Classification training.  As the development cycle of Classification continues and more content types are training, the Grooper designer now has a single place to review, test, and perform regression testing for Classification.
A Grooper Designer can review and keep track off all of the documents that have been used for ''TF-IDF'' Classification training.  As the development cycle of Classification continues and more content types are training, the Grooper Designer now has a single place to review, test and perform regression testing for Classification


|| [[File:2023 Training Batch 04.png]]
|| [[File:2023 Training Batch 04.png]]
Line 59: Line 69:
</tabs>
</tabs>
<br/>
<br/>
{|class="attn-box"
|-
|⚠
|
It is important to understand that the '''Training Set''' is not tied to the actual '''TF-IDF Weightings''' that is associated with the '''Content Type''' or '''Content Category'''.  Purging the training from a '''Content Model''' does not delete any or all of the documents in the '''Training Set'''.  Conversely, deleting a document from the '''Training Set''' does not remove or purge any'''TF-IDF Weightings''' from a '''Content Type''' or '''Content Category.'''
It is important to understand that the '''Training Set''' is not tied to the actual '''TF-IDF Weightings''' that is associated with the '''Content Type''' or '''Content Category'''.  Purging the training from a '''Content Model''' does not delete any or all of the documents in the '''Training Set'''.  Conversely, deleting a document from the '''Training Set''' does not remove or purge any'''TF-IDF Weightings''' from a '''Content Type''' or '''Content Category.'''
|}
<br/>
<br/>
==Version Differences==
Versions prior to '''Grooper 2.9''' do not automatically generate a '''Training Set''' batch in the '''Local Resources''' folder
[[Category:Articles]]
[[Category:Version 2.90]]

Latest revision as of 10:44, 22 November 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520232.90
This is a snippet of the Grooper Design Studio UI showing the Training Set batch.

The Training Batch is a special inventory_2 Batch created when training document examples using the Lexical classification method. The Training Batch service two purposes: (1) It is a Batch that holds all previously trained folder Batch Folders. Designers can go to this Batch to view these documents and copy and paste them into other Batches if needed. (2) Batch Folders in the Training Batch will be used to re-train the Content Model's classification data when the Rebuild Training command is executed.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

During the development and training of TF-IDF Classification in a Grooper Content Model, it can be challenging to keep track of all of the samples that are used during training. In previous versions, each trained sample was stored under each content type in the Grooper Design Studio node tree. In 2.9, the trained samples are stored both under each content type and in the Training Set batch.


How To

Following is an example of how to perform TF-IDF classification that creates the Training Set batch. In the example content model, there are five different content types from three different batches.

Some of the tabs in this tutorial are longer than the others. Please scroll to the bottom of each step's tab before going to the step.

Prerequisites

Following these steps assumes you already have a content model created up with Lexical set as the Classification Method and the appropriate Text Feature Extractor selected. In the example content model, this property is set to Words(Stemmed)

Train Content Types

  1. You will need to create a Batch Process with a "Classify" Batch Process Step.
  2. Go to the "Classification Tester" tab.
  3. Right click on the folder you wish to train and hover over "Classification".
  4. Click on "Train As..." to train the document.

Repeat these steps for remaining Content Types. In the example Content Model provided, train all five Content Types from all three example batches

Review the Training Set batch

As you train your content types you will see a Training Set batch begin to populate under the Local Resources folder.
A Grooper Designer can review and keep track off all of the documents that have been used for TF-IDF Classification training. As the development cycle of Classification continues and more content types are training, the Grooper Designer now has a single place to review, test and perform regression testing for Classification


It is important to understand that the Training Set is not tied to the actual TF-IDF Weightings that is associated with the Content Type or Content Category. Purging the training from a Content Model does not delete any or all of the documents in the Training Set. Conversely, deleting a document from the Training Set does not remove or purge anyTF-IDF Weightings from a Content Type or Content Category.