2023:Import Mode (Property): Difference between revisions

← Older edit Newer edit →

Revision as of 09:27, 10 May 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

WIP

This article is a work-in-progress and may abruptly stop in the middle of a section.

FYI

"Sparse Import" redirects here. How you configure an Import Provider's Import Mode determines if documents are imported sparsely.

Set Import Mode to Sparse to perform a sparse import.

Glossary

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Box: Box is a connection option for cloud CMIS Connections. It Grooper to the Box content management system for import and export operations.

CMIS Connection: cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Import: CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

CMIS Repository: settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

CMIS: CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.

Data Field: variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

Data Fields are frequently referred to simply as "fields".

Data Model: data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Import Descendants: Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files from a settings_system_daydream CMIS Repository folder location, including any files in any sub-folders (i.e. all "descendant" files).

Import Mode and Document Linking:

Import Provider: Import Providers enable Grooper to import file-based content from numerous sources, including Windows file systems, SFTP file systems, mail servers and various content management systems (CMS). An Import Provider is selected and configured when configuring "Import Jobs". Import Jobs are submitted in one of two ways:

By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an Import Provider is selected and configured using using the "Provider" property.

Import Query Results: Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files from a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

Repository: A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:

PRIMARILY a Grooper Repository. This is most commonly what people are referring to when they simply say "repository".
Less commonly a CMIS Repository

About

A Side Note on Importing in General

Forget about Import Modes for a second. How do you even import documents into Grooper at all? You import documents into a Grooper environment using an Import Provider.

The simplest way to import in Grooper is to "Submit a new import job" from the "Imports" page.

The Import Providers highlighted in turquoise below are considered "Legacy" Import Providers and should only be used for older configurations that have been kept through upgrading, or in very specific circumstances where CMIS Connections do not provide the desired connectivity. The properties for connecting to systems using these Import Providers are set on the "Import Job" rather than on a specific object in Grooper, therefore their settings are not re-usable.

It is considered best practice to use the Import Providers highlighted in red below. These leverage CMIS Repository objects for their connection configuraiton, which give them the most functionality and are the most developed means of importing in Grooper. Of the two shown, Import Query Results should be your first choice as it is even more fully featured than the Import Descendants option. However, Import Query Results can only be leveraged by "query-able", indexed content systems. Import Descendants should only be used in cases where the content system connected to by your CMIS Connection is not "query-able".

There are a myriad of articles related to "CMIS" here on the Grooper Wiki, but the ones most related to this topic would be:

Ad-Hoc Import Jobs vs Automated Import Jobs

Starting an "Import Job" from the "Imports" page is considered ad-hoc, so an "Import Job" that only happens once. While you can manually repeat a "Import Job" that you have not cleared from the list of completed "Import Jobs" from the "Imports Page", it is necessarily not an automated procedure.

If you want an "Import Job" to repeat indefinitely based on some kind of schedule, you would need to use an Import Watcher service established in Grooper Config. The configuration of an "Import Job" on an Import Watcher service is identical to what is pictured above in an ad-hoc import, but the Import Watcher service itself has built in configuration for scheduling that allows for the automation of starting "Import Jobs".

What is an Import Mode?

When importing documents (i.e. files in an external storage platform), they contain two important sets of information:

Content - The file itself, such as a PDF.
Properties - Metadata associated with the file originating from the source storage platform. This can be as basic as the file's name or something more custom like fields in a Box.com metadata template.

There are three Import Modes in Grooper:

Full - This mode fully imports each file as a Batch Folder in the Batch. Both their content and their properties are loaded into the Grooper Filestore upon import.
- Because the files are fully copied from the source into a Grooper environment, this is the slowest of the three Import Modes.
- Import speed can be further impacted by network traffic required to copy the files associated with each document from their original source to the Grooper Filestore.
Sparse - The Sparse Import Mode loads a file's properties as it does in Full mode. However, instead of fully importing the document's content, a link between Grooper and their content at the import source is created.
- When Grooper needs to access the document's content, it travels the link attached to the Batch Folder to retrieve the attached file from the import source.
- Because import operations must run single threaded in grooper, when importing large document sets, this can greatly reduce the time it takes to import documents.
- If needed, the content can also be loaded into the Grooper Filestore in parallel using the Execute activity.
LinkOnly - This mode only creates an appropriate object in Grooper and only links to both the content of the document and its properties.
- This is by far the fastest of the three Import Modes. Only a Batch Folder and a link to the source document are created for each imported file.
- Like Sparse imports, the content of as well as the properties of the document can be loaded in parallel using the Execute activity.
  - Please note, these properties must map to Grooper Data Fields in a Data Model. As such, the documents must be classified first before their properties can be loaded.

How Does Sparse Import Save Time?

Consider what is occurring when importing a document into Grooper using the Full Import Mode:

An object is made in Grooper representing a document (a Batch Folder in a Batch).
- This takes little to no time to accomplish as all that is occurring is a row is being added in SQL to the TreeNode table of the Grooper Database representing the document object.
The properties of the document are loaded onto the created document object.
- This is only slightly more time consuming than creating an object in Grooper as some data is being copied through the network.
The content associated with the document is copied from its source location to the Grooper Filestore and associated with the created object.
- This is by far the most time consuming portion of this process as the content of a document can vary wildly in storage size. If a document is fully electronic in nature it will be relatively small, but this of course depends on the size of the document itself. However, documents that contain images to represent their pages can range from small to gargantuan depending on the number of pages of the document, the color depth of the images, the resolution, the number of pages, etc.

A Sparse import only does the first two tasks listed above and simply links to the document's contents where they exist in their system of origin. This vastly speeds up the import process. The main thing to consider about a Sparse import is that while the document is being processed in Grooper, the original document should not be moved from its original source or the link between the document object in Grooper and the original file will be broken. Upon completion of processing in Grooper, however, the document could later be moved. However, at the time of import, using the Sparse Import Mode, you can simultaneous import and move the document via configuration of the Import Provider. The object made in Grooper will point to the new location of the document while still only existing as a link.

What Is a Document Link?

@@ Line 17: / Line 17: @@
 * Set '''''Import Mode''''' to ''Sparse'' to perform a sparse import.
 |}
+== Glossary ==
+<u><big>'''Batch Folder'''</big></u>: {{#lst:Glossary|Batch Folder}}
+<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
+<u><big>'''Box'''</big></u>: {{#lst:Glossary|Box}}
+<u><big>'''CMIS Connection'''</big></u>: {{#lst:Glossary|CMIS Connection}}
+<u><big>'''CMIS Import'''</big></u>: {{#lst:Glossary|CMIS Import}}
+<u><big>'''CMIS Repository'''</big></u>: {{#lst:Glossary|CMIS Repository}}
+<u><big>'''CMIS'''</big></u>: {{#lst:Glossary|CMIS}}
+<u><big>'''Data Field'''</big></u>: {{#lst:Glossary|Data Field}}
+<u><big>'''Data Model'''</big></u>: {{#lst:Glossary|Data Model}}
+<u><big>'''Execute'''</big></u>: {{#lst:Glossary|Execute}}
+<u><big>'''Import Descendants'''</big></u>: {{#lst:Glossary|Import Descendants}}
+<u><big>'''Import Mode and Document Linking'''</big></u>: {{#lst:Glossary|Import Mode and Document Linking}}
+<u><big>'''Import Provider'''</big></u>: {{#lst:Glossary|Import Provider}}
+<u><big>'''Import Query Results'''</big></u>: {{#lst:Glossary|Import Query Results}}
+<u><big>'''Repository'''</big></u>: {{#lst:Glossary|Repository}}
 == About ==
 === A Side Note on Importing in General ===
 Forget about '''''Import Modes''''' for a second.  How do you even import documents into Grooper ''at all''?  You import documents into a Grooper environment using an [[Import Provider]].