Import Mode and Document Linking (Concept)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 20232.80

Import Mode and Document Linking refers to the usage of the Import Mode property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.

FYI

"Sparse Import" redirects here. How you configure an Import Provider's Import Mode determines if documents are imported sparsely.

  • Set Import Mode to Sparse to perform a sparse import.

About

A Side Note on Importing in General

Forget about Import Modes for a second. How do you even import documents into Grooper at all? You import documents into a Grooper environment using an Import Provider.

The simplest way to import in Grooper is to "Submit a new import job" from the "Imports" page.

The Import Providers highlighted in turquoise below are considered "Legacy" Import Providers and should only be used for older configurations that have been kept through upgrading, or in very specific circumstances where CMIS Connections do not provide the desired connectivity. The properties for connecting to systems using these Import Providers are set on the "Import Job" rather than on a specific object in Grooper, therefore their settings are not re-usable.

It is considered best practice to use the Import Providers highlighted in red below. These leverage CMIS Repository objects for their connection configuraiton, which give them the most functionality and are the most developed means of importing in Grooper. Of the two shown, Import Query Results should be your first choice as it is even more fully featured than the Import Descendants option. However, Import Query Results can only be leveraged by "query-able", indexed content systems. Import Descendants should only be used in cases where the content system connected to by your CMIS Connection is not "query-able".

There are a myriad of articles related to "CMIS" here on the Grooper Wiki, but the ones most related to this topic would be:


  1. Start by going to the Imports page.
  2. Click the "Add" button (indicated by a + icon) in the top right of the screen to submit an Import job.
  3. The "Submit Import Job" window will open.
  4. A description of the Import job must be entered into the Description property.
  5. Click the drop-down for the Provider property and select an Import Provider.
    • The following are considered "legacy" Import Providers and should only be used in very specific circumstances:
      • File System Import
      • FTP Import
      • Mail Import
      • OPEX Import
      • SFTP Import
      • Test Batch
    • The following are considered "best practice" Import Providers as they leverage CMIS Repository objects which provide the most functionality and support:
      • Import Descendants
      • Import Query Results

Ad-Hoc Import Jobs vs Automated Import Jobs

Starting an "Import Job" from the "Imports" page is considered ad-hoc, so an "Import Job" that only happens once. While you can manually repeat a "Import Job" that you have not cleared from the list of completed "Import Jobs" from the "Imports Page", it is necessarily not an automated procedure.

If you want an "Import Job" to repeat indefinitely based on some kind of schedule, you would need to use an Import Watcher service established in Grooper Command Console. The configuration of an "Import Job" on an Import Watcher service is identical to what is pictured above in an ad-hoc import, but the Import Watcher service itself has built in configuration for scheduling that allows for the automation of starting "Import Jobs".

What is an Import Mode?

When importing documents (i.e. files in an external storage platform), they contain two important sets of information:

  • Content - The file itself, such as a PDF.
  • Properties - Metadata associated with the file originating from the source storage platform. This can be as basic as the file's name or something more custom like fields in a Box.com metadata template.

There are three Import Modes in Grooper:

  1. Full - This mode fully imports each file as a Batch Folder in the Batch. Both their content and their properties are loaded into the Grooper Filestore upon import.
    • Because the files are fully copied from the source into a Grooper environment, this is the slowest of the three Import Modes.
    • Import speed can be further impacted by network traffic required to copy the files associated with each document from their original source to the Grooper Filestore.
  2. Sparse - The Sparse Import Mode loads a file's properties as it does in Full mode. However, instead of fully importing the document's content, a link between Grooper and their content at the import source is created.
    • When Grooper needs to access the document's content, it travels the link attached to the Batch Folder to retrieve the attached file from the import source.
    • Because import operations must run single threaded in grooper, when importing large document sets, this can greatly reduce the time it takes to import documents.
    • If needed, the content can also be loaded into the Grooper Filestore in parallel using the Execute activity.
  3. LinkOnly - This mode only creates an appropriate object in Grooper and only links to both the content of the document and its properties.
    • This is by far the fastest of the three Import Modes. Only a Batch Folder and a link to the source document are created for each imported file.
    • Like Sparse imports, the content of as well as the properties of the document can be loaded in parallel using the Execute activity.
      • Please note, these properties must map to Grooper Data Fields in a Data Model. As such, the documents must be classified first before their properties can be loaded.

How Does Sparse Import Save Time?

Consider what is occurring when importing a document into Grooper using the Full Import Mode:

  1. An object is made in Grooper representing a document (a Batch Folder in a Batch).
    • This takes little to no time to accomplish as all that is occurring is a row is being added in SQL to the TreeNode table of the Grooper Database representing the document object.
  2. The properties of the document are loaded onto the created document object.
    • This is only slightly more time consuming than creating an object in Grooper as some data is being copied through the network.
  3. The content associated with the document is copied from its source location to the Grooper Filestore and associated with the created object.
    • This is by far the most time consuming portion of this process as the content of a document can vary wildly in storage size. If a document is fully electronic in nature it will be relatively small, but this of course depends on the size of the document itself. However, documents that contain images to represent their pages can range from small to gargantuan depending on the number of pages of the document, the color depth of the images, the resolution, the number of pages, etc.

A Sparse import only does the first two tasks listed above and simply links to the document's contents where they exist in their system of origin. This vastly speeds up the import process. The main thing to consider about a Sparse import is that while the document is being processed in Grooper, the original document should not be moved from its original source or the link between the document object in Grooper and the original file will be broken. Upon completion of processing in Grooper, however, the document could later be moved. However, at the time of import, using the Sparse Import Mode, you can simultaneous import and move the document via configuration of the Import Provider. The object made in Grooper will point to the new location of the document while still only existing as a link.