Import Mode (Property)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 20232.80

Import Mode is a configurable property for CMIS Import providers. This controls how file content is loaded into a Grooper Repository during an Import Job. This property is key to setting up a "Sparse" import in Grooper.

FYI

"Sparse Import" redirects here. Sparse imports are executed by setting the Import Mode to "Sparse".

About

To understand Grooper's "Import Modes", you should also know about "Document Links".

What is a "Document Link"?

Document Links are the main type of "Content Link" in Grooper. They are created whenever documents are imported or exported.

  • Import links are added to Batch Folders whenever they are created by Import Jobs.
  • Export links are added to Batch Folders after they are exported by the Export activity.


Most users are concerned with links added to a document (Batch Folder) on import.

Files are imported into Grooper by Import Providers like "Import Descendants" and "Import Query Results". When the Import Provider runs, the following occurs:

  1. A Batch is created according to the Batch Creation settings.
  2. A Batch Folder is created for each file that is imported.
  3. A link is attached to that Batch Folder.
    • If imported by Import Descendants or Import Query Results, this will be a "CMIS Document Link".
    • If imported by File System Import, this will be a "File System Link".
  4. File content/data is associated with the Batch folder according to the Import Provider's "Import Mode".


Document Links are Grooper's connection to the source file. This link allows Grooper to speed up overall import operations with the "Sparse" mode. For sparsely imported document, only a link to the file is stored in Grooper, rather than the file itself. Links also allow Grooper to update a source file with content processed by Grooper.

  • Links can be removed with the "Remove Link" command.


What is an "Import Mode"?

The Import Provider's "Import Mode" controls how file content and data associated with that file (its properties and any metadata field values in a content management system) are imported into Grooper. Practically speaking, this has an effect on the overall speed of the import process.

Files can be imported using one of three "Import Modes":

  • Copy - The files are fully copied to the Grooper Repository on import.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
  • Sparse - Only the file properties and mapped metadata are copied to the Grooper Repository.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
    • The file content is not copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder. The file can be copied to the Grooper File Store with the "CMIS Document Link > Load" command.
  • Link Only (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • File content and mapped properties/metadata must be brought into Grooper with the "CMIS Document Link > Load" command.


Sparse imports serve two functions:

  • Primarily, they are used to speed up the overall import operation. This is actually a two step process.
    1. Enable the "Sparse" Import Mode when configuring the Import Provider.
    2. Have the first step in the Batch Process fully copy the files into the Grooper Repository (using the Execute activity and the "CMIS Document Link > Load" command).
  • They can also be used to avoid file duplication between the import source location and the Grooper File Store.
    • A fully copied import creates a copy of the file in the Grooper File Store. A sparsely imported document is fully usable in Grooper, but no such copy exists in the File Store.
    • Instead, Grooper travels the link every time it needs to access the file (Example: When the file's image is pulled up in the Document Viewer).
    • While this does save on storage between two systems (Grooper and the import source), it does not save on processing time. Every time Grooper needs to access the document to view it, execute a command, or run an activity, it will take some time to travel the document link and fetch the document. Depending on latency, it may be preferable to load the file into Grooper even if it does duplicate the file (The file can always be removed from Grooper with a Dispose step at the end of a Batch Process too).


How does using "Sparse" speed up import?

It increases the parallelism of the overall import operation.

Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.

When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.

  • By only importing a link to the file content, Sparse mode dramatically speeds up the time it takes to get a usable document into Grooper.
  • To take full advantage of your system's resources, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
    • Be Aware: The "Load" command has three modes (1) Content (2) Properties and (3) Full. For "Properties" and "Full" to work appropriately, the Batch Folders must be classified on import and use an Import Behavior to map the properties.
  • The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.


How do I select an Import Mode?

You select an Import Mode when configuring an Import Provider. Import Providers are configured in two ways:

Depending on the Import Provider you select, you will do things slightly differently.

  • When selecting "Import Query Results" or "Import Descendants": You will be able to configure the "Import Mode" property.
    • Your choices will be "Copy" "Sparse" or "Link Only"
  • When selecting all other Import Providers: You will be able to configure the "Sparse Import" property. This can be True or False.
    • True will perform a "Sparse" import. False will perform a "Copy" import.
    • There is no "Link Only" option for these providers.