TF-IDF (Concept): Difference between revisions

Revision as of 16:37, 27 December 2019

TF-IDF (Term Frequency-Inverse Document Frequency) is how Grooper uses machine learning for document classification and data extraction. TF-IDF is designed to reflect how important a word is to a document or a set of documents. The importance value (or weighting) of a word increases proportionally to the number of times it appears in the document (Term Frequency). The weighting is offset by the number of documents in the set containing the word (Inverse Document Frequency). Some words appear more generally across documents and hence are less unique identifiers. So, their weighting is lessened.

Revision as of 16:36, 27 December 2019 view source Configadmin (talk \| contribs) Interface administrators, Administrators 1,305 edits Created page with "TF-IDF (Term Frequency-Inverse Document Frequency) is how Grooper uses machine learning for document classification and Field Class\|data extracti..."		Revision as of 16:37, 27 December 2019 view source Configadmin (talk \| contribs) Interface administrators, Administrators 1,305 edits No edit summary Newer edit →
Line 1:		Line 1:
	TF-IDF (Term Frequency-Inverse Document Frequency) is how Grooper uses machine learning for document [[Training Based Approach\|classification]] and [[Field Class\|data extraction]]. TF-IDF is designed to reflect how important a word is to a document or a set of documents. The importance value (or weighting) of a word increases proportionally to the number of times it appears in the document (Term Frequency). The weighting is offset by the number of documents in the set containing the word (Inverse Document Frequency). Some words appear more generally across documents and hence are less unique identifiers. So, their weighting is lessened.		TF-IDF (Term Frequency-Inverse Document Frequency) is how Grooper uses machine learning for document [[Training-Based Approach\|classification]] and [[Field Class\|data extraction]]. TF-IDF is designed to reflect how important a word is to a document or a set of documents. The importance value (or weighting) of a word increases proportionally to the number of times it appears in the document (Term Frequency). The weighting is offset by the number of documents in the set containing the word (Inverse Document Frequency). Some words appear more generally across documents and hence are less unique identifiers. So, their weighting is lessened.