Unstructured (Pagination): Difference between revisions

From Grooper Wiki
Created page with "Document Types with an Unstructured pagination are used to separate documents with a unstructured form and variable length during ESP Auto Separation|ESP S..."
 
Tag: New redirect
Line 1: Line 1:
[[Document Type|Document Types]] with an Unstructured pagination are used to separate documents with a unstructured form and variable length during [[ESP Auto Separation|ESP Separation]].  Unstructured documents differ from structured ones in that the semantic information, while present, isn't always found in the same location and the same way across multiple documents.  Contracts are an example of unstructured documents.  While there are similarities across types of contracts (an Oil and Gas Lease Agreement for example), how and where that information is presented varies from one contract to another.  There may be different clauses in the contract in different locations from one to another or removed altogether.  One contract may be one page, one may be three pages, one may be twenty, but all the same kind of contract (such as an Oil and Gas Lease Agreement).  
#REDIRECT [[ESP Auto Separation#Unstructured]]
 
During separation, a new folder will be created if it matches the first page of a trained example (stored as a [[Form Type]] of the [[Document Type]]).  The document will extend to include all pages meeting the minimum confidence similarity for any trained example pages of the [[Document Type]].  However, if EPI (embedded page information) is present (obtained from an EPI Extractor), the document will extend to the point indicated by the page numbering extracted.

Revision as of 16:30, 29 October 2020