2023:Import Mode (Property): Difference between revisions

From Grooper Wiki
No edit summary
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{|cellpadding="10" cellspacing="5"
{{AutoVersion}}
|-style="background-color:#ed2330; color:white"
 
|style="font-size:14pt"|'''WIP'''||This article is a work-in-progress and may abruptly stop in the middle of a section.
{|class="wip-box"
|
'''WIP'''
|
This article is a work-in-progress and may abruptly stop in the middle of a section.
|}
|}


<onlyinclude>
<blockquote>{{#lst:Glossary|Import Mode and Document Linking}}</blockquote>
<blockquote style="font-size:14pt">
The '''''Import Mode''''' property of an '''''[[Import Provider]]''''' allows you to control whether a document's content (i.e. the images and, in the case of PDF documents, their text), its properties, and a link between the document's source location and Grooper are created. 
</blockquote>


There are three '''''Import Modes''''' in Grooper:
{|class="fyi-box"
 
|
# ''Full'' - This mode fully imports the document.  Both their content (the files associated with the document like PDF pages and their associated electronic native text, or images of pages) and their properties (metadata associated with the file, or digital information, such as the document's filename, file type, creation date, and more.) are loaded into a Grooper '''[[Batch]]'''.  Because the files are fully copied from the source into a Grooper environment, this is the slowest of the three '''''Import Modes''''' especially considering the network traffice required to copy the files associated with a document form their original source to the '''Grooper Filestore'''.
'''FYI'''
# ''Sparse'' - The ''Sparse'' '''''Import Mode''''' imports a document's properties as it does in ''Full'' mode.  However, instead of fully importing the document's content, a link between Grooper and their content at the import source is created.  Particularly when importing large document sets, this can greatly reduce the time it takes to import documents.  If needed, the content can also be loaded in parallel using the '''[[Execute (Activity)|Execute]]''' activity.
# ''LinkOnly'' - This mode only creates an appropriate object in '''Grooper''' and only links to both the content of the document and its properties. Given no information is moving across the network in this process, this is by far the fastest of the three '''Import Modes'''. Like '''Sparse Import''', the content of ''as well as'' the properties of the document can be loaded in parallel using the '''Execute''' activity.
</onlyinclude>
 
{| class="wikitable" style="margin:left"
! Previous Versions
|-
|
|
[[Import Mode and Document Linking - 2.80]]
"Sparse Import" redirects here.  How you configure an '''''Import Provider's''''' '''''Import Mode''''' determines if documents are imported sparsely.
<br>
* Set '''''Import Mode''''' to ''Sparse'' to perform a sparse import.
|}
|}


== About ==
== About ==
 
=== A Side Note on Importing in General ===
=== Where is the Import Mode Set? ===
 
Forget about '''''Import Modes''''' for a second.  How do you even import documents into Grooper ''at all''?  You import documents into a Grooper environment using an [[Import Provider]].
Forget about '''''Import Modes''''' for a second.  How do you even import documents into Grooper ''at all''?  You import documents into a Grooper environment using an [[Import Provider]].


Line 39: Line 31:
There are a myriad of articles related to "CMIS" here on the Grooper Wiki, but the ones most related to this topic would be:
There are a myriad of articles related to "CMIS" here on the Grooper Wiki, but the ones most related to this topic would be:
* [[CMIS Connection]]
* [[CMIS Connection]]
* [[CMIS Query]]
* [[CMIS Import]]


[[File:2023_Import-Mode-and-Document-Linking_01_About_01.png]]
[[File:2023_Import-Mode-and-Document-Linking_01_About_01.png]]


==== A Note on Ad-Hoc Import Jobs vs Automated Import Jobs ====
==== Ad-Hoc Import Jobs vs Automated Import Jobs ====
Starting an "Import Job" from the "Imports" page is considered ad-hoc, so an "Import Job" that only happens once. While you can manually repeat a "Import Job" that you have not cleared from the list of completed "Import Jobs" from the "Imports Page", it is necessarily not an automated procedure.
Starting an "Import Job" from the "Imports" page is considered ad-hoc, so an "Import Job" that only happens once. While you can manually repeat a "Import Job" that you have not cleared from the list of completed "Import Jobs" from the "Imports Page", it is necessarily not an automated procedure.


If you want an "Import Job" to repeat indefinitely based on some kind of schedule, you would need to use an '''Import Watcher''' service established in '''Grooper Config'''. The configuration of an "Import Job" on an '''Import Watcher''' service is identical to what is pictured above in an ad-hoc import, but the '''Import Watcher''' service itself has built in configuraiton for scheduling that allows for the automation of starting "Import Jobs".
If you want an "Import Job" to repeat indefinitely based on some kind of schedule, you would need to use an '''Import Watcher''' service established in '''Grooper Config'''. The configuration of an "Import Job" on an '''Import Watcher''' service is identical to what is pictured above in an ad-hoc import, but the '''Import Watcher''' service itself has built in configuration for scheduling that allows for the automation of starting "Import Jobs".


=== What is an Import Mode? ===
=== What is an Import Mode? ===


For importing, documents contain two important sets of information:
When importing documents (i.e. files in an external storage platform), they contain two important sets of information:
 
*Content - The file itself, such as a PDF.
*Properties - Metadata associated with the file originating from the source storage platform.  This can be as basic as the file's name or something more custom like fields in a Box.com metadata template.


*Content - Images and native text data
There are three '''''Import Modes''''' in Grooper:
*Properties - Metadata associated with the file.  Digital information, such as the document's filename, file type, creation date, and more.


{|cellpadding=20
# ''Full'' - This mode fully imports each file as a '''Batch Folder''' in the '''Batch'''.  Both their content and their properties are loaded into the Grooper Filestore upon import.
|-valign=top
#* Because the files are fully copied from the source into a Grooper environment, this is the slowest of the three '''''Import Modes'''''.
|As far as it's content goes, each page of a document will have a corresponding image, such as this W-4 form.
#* Import speed can be further impacted by network traffic required to copy the files associated with each document from their original source to the Grooper Filestore.
|
# ''Sparse'' - The ''Sparse'' '''''Import Mode''''' loads a file's properties as it does in ''Full'' mode.  However, instead of fully importing the document's content, a link between Grooper and their content at the import source is created.
[[file:W-4 Image.png|border|center|300px]]
#* When Grooper needs to access the document's content, it travels the link attached to the '''Batch Folder''' to retrieve the attached file from the import source.
|For native PDFs, they may also have text data already embedded in the document too.
#* Because import operations must run single threaded in grooper, when importing large document sets, this can greatly reduce the time it takes to import documents.
|
#* If needed, the content can also be loaded into the Grooper Filestore in parallel using the '''[[Execute (Activity)|Execute]]''' activity.
[[file:W-4 Text.png|border|center|300px]]
# ''LinkOnly'' - This mode only creates an appropriate object in '''Grooper''' and only links to both the content of the document and its properties.  
|}
#* This is by far the fastest of the three '''Import Modes'''.  Only a '''Batch Folder''' and a link to the source document are created for each imported file.
#* Like ''Sparse'' imports, the content of ''as well as'' the properties of the document can be loaded in parallel using the '''Execute''' activity.
#** Please note, these properties must map to Grooper '''Data Fields''' in a '''Data Model'''.  As such, the documents must be classified first before their properties can be loaded.


=== How Does Sparse Import Save Time? ===
=== How Does Sparse Import Save Time? ===
Consider what is occuring when importing a document into '''Grooper''' using the ''Full'' '''''Import Mode''''':
Consider what is occurring when importing a document into '''Grooper''' using the ''Full'' '''''Import Mode''''':
# An object is made in '''Grooper''' representing a document.
# An object is made in '''Grooper''' representing a document (a '''Batch Folder''' in a '''Batch''').
#* This takes little to no time to accomplish as all that is occuring is a row is being added in SQL to the TreeNode table of the '''Grooper Database''' representing the document object.
#* This takes little to no time to accomplish as all that is occurring is a row is being added in SQL to the TreeNode table of the '''Grooper Database''' representing the document object.
# The properties of the document are loaded onto the created document object.
# The properties of the document are loaded onto the created document object.
#* This is only slightly more time consuming than creating an object in '''Grooper''' as ''some'' data is being copied through the network.
#* This is only slightly more time consuming than creating an object in '''Grooper''' as ''some'' data is being copied through the network.
Line 79: Line 75:


=== What Is a Document Link? ===
=== What Is a Document Link? ===
[[Category:Articles]]
[[Category:Version 2023]]

Latest revision as of 14:11, 27 May 2025

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

2025202420232.80

WIP

This article is a work-in-progress and may abruptly stop in the middle of a section.

FYI

"Sparse Import" redirects here. How you configure an Import Provider's Import Mode determines if documents are imported sparsely.

  • Set Import Mode to Sparse to perform a sparse import.

About

A Side Note on Importing in General

Forget about Import Modes for a second. How do you even import documents into Grooper at all? You import documents into a Grooper environment using an Import Provider.

The simplest way to import in Grooper is to "Submit a new import job" from the "Imports" page.

The Import Providers highlighted in turquoise below are considered "Legacy" Import Providers and should only be used for older configurations that have been kept through upgrading, or in very specific circumstances where CMIS Connections do not provide the desired connectivity. The properties for connecting to systems using these Import Providers are set on the "Import Job" rather than on a specific object in Grooper, therefore their settings are not re-usable.

It is considered best practice to use the Import Providers highlighted in red below. These leverage CMIS Repository objects for their connection configuraiton, which give them the most functionality and are the most developed means of importing in Grooper. Of the two shown, Import Query Results should be your first choice as it is even more fully featured than the Import Descendants option. However, Import Query Results can only be leveraged by "query-able", indexed content systems. Import Descendants should only be used in cases where the content system connected to by your CMIS Connection is not "query-able".

There are a myriad of articles related to "CMIS" here on the Grooper Wiki, but the ones most related to this topic would be:

Ad-Hoc Import Jobs vs Automated Import Jobs

Starting an "Import Job" from the "Imports" page is considered ad-hoc, so an "Import Job" that only happens once. While you can manually repeat a "Import Job" that you have not cleared from the list of completed "Import Jobs" from the "Imports Page", it is necessarily not an automated procedure.

If you want an "Import Job" to repeat indefinitely based on some kind of schedule, you would need to use an Import Watcher service established in Grooper Config. The configuration of an "Import Job" on an Import Watcher service is identical to what is pictured above in an ad-hoc import, but the Import Watcher service itself has built in configuration for scheduling that allows for the automation of starting "Import Jobs".

What is an Import Mode?

When importing documents (i.e. files in an external storage platform), they contain two important sets of information:

  • Content - The file itself, such as a PDF.
  • Properties - Metadata associated with the file originating from the source storage platform. This can be as basic as the file's name or something more custom like fields in a Box.com metadata template.

There are three Import Modes in Grooper:

  1. Full - This mode fully imports each file as a Batch Folder in the Batch. Both their content and their properties are loaded into the Grooper Filestore upon import.
    • Because the files are fully copied from the source into a Grooper environment, this is the slowest of the three Import Modes.
    • Import speed can be further impacted by network traffic required to copy the files associated with each document from their original source to the Grooper Filestore.
  2. Sparse - The Sparse Import Mode loads a file's properties as it does in Full mode. However, instead of fully importing the document's content, a link between Grooper and their content at the import source is created.
    • When Grooper needs to access the document's content, it travels the link attached to the Batch Folder to retrieve the attached file from the import source.
    • Because import operations must run single threaded in grooper, when importing large document sets, this can greatly reduce the time it takes to import documents.
    • If needed, the content can also be loaded into the Grooper Filestore in parallel using the Execute activity.
  3. LinkOnly - This mode only creates an appropriate object in Grooper and only links to both the content of the document and its properties.
    • This is by far the fastest of the three Import Modes. Only a Batch Folder and a link to the source document are created for each imported file.
    • Like Sparse imports, the content of as well as the properties of the document can be loaded in parallel using the Execute activity.
      • Please note, these properties must map to Grooper Data Fields in a Data Model. As such, the documents must be classified first before their properties can be loaded.

How Does Sparse Import Save Time?

Consider what is occurring when importing a document into Grooper using the Full Import Mode:

  1. An object is made in Grooper representing a document (a Batch Folder in a Batch).
    • This takes little to no time to accomplish as all that is occurring is a row is being added in SQL to the TreeNode table of the Grooper Database representing the document object.
  2. The properties of the document are loaded onto the created document object.
    • This is only slightly more time consuming than creating an object in Grooper as some data is being copied through the network.
  3. The content associated with the document is copied from its source location to the Grooper Filestore and associated with the created object.
    • This is by far the most time consuming portion of this process as the content of a document can vary wildly in storage size. If a document is fully electronic in nature it will be relatively small, but this of course depends on the size of the document itself. However, documents that contain images to represent their pages can range from small to gargantuan depending on the number of pages of the document, the color depth of the images, the resolution, the number of pages, etc.

A Sparse import only does the first two tasks listed above and simply links to the document's contents where they exist in their system of origin. This vastly speeds up the import process. The main thing to consider about a Sparse import is that while the document is being processed in Grooper, the original document should not be moved from its original source or the link between the document object in Grooper and the original file will be broken. Upon completion of processing in Grooper, however, the document could later be moved. However, at the time of import, using the Sparse Import Mode, you can simultaneous import and move the document via configuration of the Import Provider. The object made in Grooper will point to the new location of the document while still only existing as a link.

What Is a Document Link?