Import Descendants (Import Provider): Difference between revisions

From Grooper Wiki
No edit summary
Line 39: Line 39:
Regardless of the platform, you configure Import Descendants largely the same. Just pick a CMIS Repository and a base folder in that repository.
Regardless of the platform, you configure Import Descendants largely the same. Just pick a CMIS Repository and a base folder in that repository.


'''''WORK IN PROGRESS! CHECK BACK LATER''''''
<big>Example: Submitting Import Descendants from the Imports Page</big>
 
[https://app.supademo.com/demo/cm8d4b2wt1f812ugqwtbi17bu Click here for a step by step walkthrough.]
 
# Go to the Imports Page.
# Press the "New Import Job" button.
# This brings up the "Submit Import Job" editor.
# Enter a description in the Description property (This is required).
# Open the "Provider" dropdown (Press the "☰" button).
# Select "Import Descendants" from the dropdown list.
# Expand the Provider settings to configure it.
# Open the "Repository" node selector (Press the "☰" button).
# Select the CMIS Repository you wish to import from.
# Configure the Import Mode property as needed.
# Configure the Batch Creation settings, as needed.
# Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
# Configure any remaining Import Descendants properties, as needed.
# Press the "Submit" button when finished.
# Your Import Watcher service will pick up and execute the Import Job.
 


{{#lst:CMIS Import|shared_cmis_import_settings}}
{{#lst:CMIS Import|shared_cmis_import_settings}}

Revision as of 16:35, 27 May 2025

This is a redirect page.

Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files from a settings_system_daydream CMIS Repository folder location, including any files in any sub-folders (i.e. all "descendant" files).

For information on Import Descendants visit the following resources:

About

"Import Descendants" is one of the CMIS Import providers in Grooper. It is used to import files from CMIS Repositories for Batch processing in Grooper. It will import files from a folder structure of an on-premise or cloud-based document storage platform.

  • While less common, Import Descendants can also import folders from CMIS Repositories. However, since importing files is most common, we focus on importing files in this article.


Just like any other Import Provider, Import Descendants is used to submit "Import Jobs". Import Jobs are how Grooper brings in files from a storage location for processing. For example, it's how PDFs from a Windows folder get into Grooper or messages from an email inbox get into Grooper. When an Import Job runs, Grooper first creates a Batch and then creates a Batch Folder for each imported file. A copy of the file is attached to the Batch Folder. This becomes the Batch Folder's "attachment" and is used when applying activities like "Split Pages".

  • When files are imported into Grooper, a link to that file is stored on the Batch Folder. This link maintains a connection between the file's source location and the document in Grooper. This link also makes "Sparse" imports possible. See below for more.


Import Jobs are submitted in one of two ways:

  • By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
  • From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an "Import Descendants" can be selected and configured using using the "Provider" property.


Similarities and differences between Import Query Results and Import Descendants

Overall, "Import Descendants" is a "simpler" version of "Import Query Results".

  • We advise to use Import Query Results over Import Descendants, when possible.
    • Import Query Results can do everything Import Descendants can do and more.
    • Import Query Results has more robust file filtering capabilities. This allows for more targeted, selective imports.
    • Import Query Results is newer (and better maintained) than Import Descendants.
    • There are only a handful of scenarios where Import Descendants must be used over Import Query Results.


Similarities

  • Both providers import files from a CMIS Repository.
  • Both providers have the same Batch Creation settings.
  • Both providers are capable of "Sparse" imports by changing the "Import Mode" to "Sparse".
  • Both providers can dispose of files on import (using the "Delete Item", "Move Item", or "Update Properties")

Differences

The biggest difference is in how the providers determine which files are imported (import criteria).

  • Import Descendants will import all files from a target location. This includes all files in all subfolders if present. You can, however, set a "Base Folder" within the CMIS Repository.
  • Import Query Results will import files that match a CMIS Query. This is a specialized query language based on SQL syntax. This gives you many more options for import conditions, using a "WHERE" clause in the query. CMIS Queries also give you the capability to restrict imports to a folder location without importing files in subfolders (This is something Import Descendants cannot do).
  • Import Descendants does have an "Import Filter" it can use to set import conditions. It also uses a SQL-like syntax. However, it is not as advanced as the CMIS Queries that Import Query Results uses.


CMIS Repositories that can only use Import Descendants

Certain CMIS Bindings are not queryable using CMIS Queries. Because of this, certain CMIS Repositories cannot utilize Import Query Results. The following CMIS Repositories must use Import Descendants to import file content:

  • FTP
  • SFTP
  • NTFS (only if the directory has not been indexed by the Windows Search service or the Windows Search service is not running)


Example Import Descendants configuration

Because Import Descendants imports from a CMIS Repository, you can import from numerous storage platforms determined by the "CMIS Binding" used. These CMIS Bindings include:

  • NTFS to connect to Windows folders
  • FTP to connect to FTP directories
  • SFTP to connect to SFTP directories
  • Exchange to connect to Outlook inboxes
  • SharePoint to connect to SharePoint sites (and document libraries)
  • OneDrive to connect to OneDrive drives
  • Box to connect to Box accounts
  • AppXtender to connect to AppEnhancer applications

Regardless of the platform, you configure Import Descendants largely the same. Just pick a CMIS Repository and a base folder in that repository.

Example: Submitting Import Descendants from the Imports Page

Click here for a step by step walkthrough.

  1. Go to the Imports Page.
  2. Press the "New Import Job" button.
  3. This brings up the "Submit Import Job" editor.
  4. Enter a description in the Description property (This is required).
  5. Open the "Provider" dropdown (Press the "☰" button).
  6. Select "Import Descendants" from the dropdown list.
  7. Expand the Provider settings to configure it.
  8. Open the "Repository" node selector (Press the "☰" button).
  9. Select the CMIS Repository you wish to import from.
  10. Configure the Import Mode property as needed.
  11. Configure the Batch Creation settings, as needed.
  12. Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
  13. Configure any remaining Import Descendants properties, as needed.
  14. Press the "Submit" button when finished.
  15. Your Import Watcher service will pick up and execute the Import Job.


Settings shared between Import Descendants and Import Query Results

The following properties/settings are part of configuring either Import Descendants or Import Query Results.

  • Import Mode - Configuring this property allows you to perform "Sparse" imports to speed up the time it takes to ingest large numbers of files.
  • Batch Creation settings - These settings control how Batches are created on import, including which Batch Process is used.
  • File disposition options - Optionally, you can "dispose" of the source file after it has been imported into Grooper. These settings allow you to delete the file, move it or update one of its properties.

Import Mode (and "Sparse" imports)

What is an "Import Mode"?

The Import Provider's "Import Mode" controls how file content and data associated with that file (its properties and any metadata field values in a content management system) are imported into Grooper. Practically speaking, this has an effect on the overall speed of the import process.

Files can be imported using one of three "Import Modes":

  • Copy - The files are fully copied to the Grooper Repository on import.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
  • Sparse - Only the file properties and mapped metadata are copied to the Grooper Repository.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
    • The file content is not copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder. The file can be copied to the Grooper File Store with the "CMIS Document Link > Load" command.
  • Link Only (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • File content and mapped properties/metadata must be brought into Grooper with the "CMIS Document Link > Load" command.


Sparse imports serve two functions:

  • Primarily, they are used to speed up the overall import operation. This is actually a two step process.
    1. Enable the "Sparse" Import Mode when configuring the Import Provider.
    2. Have the first step in the Batch Process fully copy the files into the Grooper Repository (using the Execute activity and the "CMIS Document Link > Load" command).
  • They can also be used to avoid file duplication between the import source location and the Grooper File Store.
    • A fully copied import creates a copy of the file in the Grooper File Store. A sparsely imported document is fully usable in Grooper, but no such copy exists in the File Store.
    • Instead, Grooper travels the link every time it needs to access the file (Example: When the file's image is pulled up in the Document Viewer).
    • While this does save on storage between two systems (Grooper and the import source), it does not save on processing time. Every time Grooper needs to access the document to view it, execute a command, or run an activity, it will take some time to travel the document link and fetch the document. Depending on latency, it may be preferable to load the file into Grooper even if it does duplicate the file (The file can always be removed from Grooper with a Dispose step at the end of a Batch Process too).


How does using "Sparse" speed up import?

It increases the parallelism of the overall import operation.

Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.

When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.

  • By only importing a link to the file content, Sparse mode dramatically speeds up the time it takes to get a usable document into Grooper.
  • To take full advantage of your system's resources, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
    • Be Aware: The "Load" command has three modes (1) Content (2) Properties and (3) Full. For "Properties" and "Full" to work appropriately, the Batch Folders must be classified on import and use an Import Behavior to map the properties.
  • The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.


Batch Creation settings

The Batch Creation settings allow you to define which Batch Process you wish to use to process the imported files.

You must configure the "Starting Step" property to assign the Batch Process.

  • Use the property's dropdown editor to select a Batch Process Step from a list of Batch Processes.
  • Only published Batch Processes will appear in this list.


Other notable Batch Creation properties:

  • "Start Paused" - This determines if the Batch starts in a paused state or not. If "False", the first step's tasks will be automatically submitted to Activity Processing services. If "True", you will have to manually start the Batch.
  • "Max Items Per Batch" - The default is 2500, meaning each Batch will have a maximum of 2500 Batch Folders before creating a new Batch on import. For users who want more Batches with fewer documents, lower this number.
  • "Organize By Date" - This will organize Batches into subfolders in the Production branch in Grooper according to the year / month / day the Batch was created.
  • "Priority" and "Increment Priority" - Controls the task processing priority for the Batch. "Increment Priority" is useful when submitting large user-directed imports from the Imports Page to ensure the first Batch created is the first that is fully processed by Activity Processing services.


File disposition settings

The "Disposition" settings allow you to do something with the source files after importing them into Grooper. This is important when using an Import Watcher service to schedule imports. If you do not configure a Disposition property, the imported file will remain in the same state after the Import Job completes. This can cause the Import Watcher to repeatedly attempt to import the same file over and over again.

There are three "Disposition" options:

  • "Delete Item" - Turning this to "True" will simply delete the source file after the Import Job completes.
    • You may only configure this option if the "Import Mode" is set to "Copy".
  • "Move To Folder" - This will move the files to a different folder in the CMIS Repository after the Import Job completes.
  • "Update Properties" - This will update one or more file properties after the Import Job completes. Properties are updated by listing them as "key-value pairs" in the "Update Properties" list editor where key=value.
    • Examples:
    • Archive=false Sets the archive attribute on each imported file to "false".
    • IsRead=true Marks each imported email message as read.
    • Status=PENDING Sets a "Status" field on a document in an AppEnhancer application (assuming there is a "Status" field in the application).