File Store (Object)

From Grooper Wiki
Revision as of 10:06, 29 October 2024 by Dgreenwood (talk | contribs) (→‎About)

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

hard_drive File Store objects define a storage location within Grooper where file content associated with nodes are saved. They are crucial for managing the content that forms the basis of the Grooper's processing tasks, allowing for the storage and retrieval of documents, images, and other "files". Not every object in Grooper will have files connected to it, but if it does, those files are stored in the location defined by this object.

The File Store is a critical part of a Grooper Repository's infrastructure. A Grooper Repository is composed of two things:

  • A Grooper Database - This stores nodes and their property configurations.
  • A Grooper File Store - This stores any files associated with those nodes.

As an object in the Node Tree, the File Store is mostly just a folder path location in a file share. However, the File Store's important in Grooper processing cannot be understated. Any time Grooper needs access to file content associated with a node, it will travel the path defined in a File Store node to locate, modify and return it as needed.

  • BE AWARE: File Stores can be any folder you have writable access to, but it is best practice to use a fully qualified UNC path.


Many Grooper Repositories will only have one File Store node that is created with the Grooper Repository is first initialized. This is the "Primary" File Store node that is created in the File Stores folder node. By default, this is set as the Grooper Repository's "Active File Store" (using the property of the same name on the Grooper Root node).


Circumstances where a second (or more) File Store node needs to be created are rare. Examples include:

  • A new File Store node may be created for the Dispose Batch activity. It can be configured to move content to a different File Store. This allows user to offload processed content to lower tier archival storage, but still access it in a Grooper Repository.
  • The current File Store runs out of space. In this scenario, a new File Store could be created and set as the Active File Store the old File Store would then serve as archival storage.


Files stored in the File Store include:

  • Images for Batch Page nodes.
  • Imported files (PDFs, TIFFs etc) attached to Batch Folder nodes.
  • Files Grooper generates for a node, such as a "Grooper.DocumentData.json" file generated for a Batch Folder by the Extract activity.

FYI

If you select an object in the Node Tree, then go to the "Advanced" tab, then go to the "Files" tab underneath each file listed is stored in a File Store.

About

The File Store in Grooper is a file share in a Windows environment. It houses the files associated with nodes in Grooper that have information that would otherwise be inefficient to store in a cell in a database table.

The Grooper File Store exists at a user-specified location. This should always be a network path (UNC path). If a File Store is given a local path, computers connecting to that repository remotely will not be able to access it. To set up a Grooper Repository so that other computers can connect to it, make sure you reference the File Store using a UNC path!

The File Store contains three levels of directories. A File Store entry will exist on disk as a file in the lowest level with a .grp extension (e.g. 00 > 00 > 00 > 00000000-0000-0000-0000-000000000000.grp). Each of the lowest-level folders in the File Store will have a maximum of 256 files, at which point a new folder at that level will be created. If the lowest level contains 256 folders, a new folder will be created at the level above; this gives the Grooper File Store a limit of 256 ^ 4 = 4,294,967,296 files stored on disk.

While the File Store entries are all given .grp extensions, the contents of the file are unaltered from their "actual" form. If you navigate, for example, to the GRP file associated with an pdf imported using full import, you can open it and view it with a PDF viewer. The files in the file store are intentionally obfuscated to prevent users from interacting with them outside of Grooper, as they are essentially "Grooper-internal" objects.

Although the majority of files in the File Store relate to Batch objects (a page's image or imported files), some files are the result of other "in-Grooper" processes, such as layout data, OCR character data, extracted index data, and more.

Information on migrating a File Store

New files, such as scanned or imported images of Batch Pages, are always written to the "Active File Store". This is a property set on the Grooper Root in the Node Tree. It will default to the File Store created when the Grooper Repository was initialized, unless otherwise specified.

If you need to "migrate", or "backup and restore" a File Store please visit this article:

Glossary

Batch Page: contract Batch Page objects represent individual pages within a inventory_2 Batch. The Batch Page object is the most granular unit in the hierarchy of Batch Objects in Grooper.

  • Batch Pages are frequently referred to simply as "pages".

File Store: hard_drive File Store objects define a storage location within Grooper where file content associated with nodes are saved. They are crucial for managing the content that forms the basis of the Grooper's processing tasks, allowing for the storage and retrieval of documents, images, and other "files". Not every object in Grooper will have files connected to it, but if it does, those files are stored in the location defined by this object.

Grooper Repository: A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Node Tree: The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Root: The database Root node object represents the topmost element of the Grooper Repository. It serves as the starting point from which all other objects branch out. It is the anchor point for all other structures within the repository and a necessary element for the organization and linkage of all other objects within Grooper.