TF-IDF (Concept)

TF-IDF (Term Frequency-Inverse Document Frequency) is how Grooper uses machine learning for document classification and data extraction. TF-IDF is designed to reflect how important a word is to a document or a set of documents. The importance value (or weighting) of a word increases proportionally to the number of times it appears in the document (Term Frequency). The weighting is offset by the number of documents in the set containing the word (Inverse Document Frequency). Some words appear more generally across documents and hence are less unique identifiers. So, their weighting is lessened.