Microfiche Processing (Concept)

From Grooper Wiki

Grooper provides a robust set of tools and activities specifically designed for processing microfiche cards. These capabilities enable users to efficiently convert microfiche images into high-quality, usable digital documents. The workflow is built around specialized activities and image processing (IP) commands that address the unique challenges of microfiche, such as tiled image assembly, frame detection, cropping, and artifact removal.

  • Grooper's microfiche processing components were developed for and tested using Mekel brand scanners. Please contact support (support@bisok.com) to verify compatibility with other microfiche scanners.

What is microfiche?

Example microfiche card.

Microfiche is a flat sheet of film containing microreproductions of documents, typically used for archiving and compact storage of large volumes of paper records. Each sheet can hold many pages of information in a reduced format, which can be viewed or digitized using specialized equipment. Microfiche has been widely used in libraries, government, and business for document preservation and retrieval before the advent of digital storage solutions.

Overview of microfiche processing components

The following Activities and IP Commands are relevant to Grooper's microfiche processing capabilties.

Activities

The following activities are specifically designed to process microfiche cards. These activities prepare images on a fiche card for later steps in a Batch Process. Their end goal is to clip each frame out of the fiche card. Each frame then becomes a single page in a Batch.

  • Initialize Card – Organizes and assembles the raw image tiles from a scanned microfiche card.
  • Detect Frames – Detects the locations of individual document frames within each strip of the fiche card.
  • Clip Frames – Crops out each detected frame as a separate page for downstream processing.

IP Commands

While these IP Commands have applications outside of microfiche processing, they are often included as part of an IP Profile cleaning up microfiche scans.

  • Extract Page (IP Command) – Extracts and de-warps individual pages from carrier images, correcting for skew or perspective.
  • Scratch Removal (IP Command) – Removes scratches and streaks from film-based images, improving image quality for OCR and data extraction.

Microfiche Activities

Initialize Card

The Initialize Card activity is the first step in microfiche processing. It performs two main functions:

  • Sorting and organizing tiles: It uses a configurable regular expression to parse row and column numbers from tile filenames, then sorts and organizes the raw image tiles into subfolders by strip and tile position.
  • Preview assembly: It assembles a low-resolution preview of the fiche card surface from the contents of the "previews" subfolder, providing a visual reference for subsequent processing and review.

The "Ordering Pattern" property is used to extract row and column information from filenames. For example:

(?<Row>[A-Z])\d\d-(?<Column>\d+)\.jpg$

This pattern ensures that each tile is placed in the correct position on the card.

Detect Frames

The Detect Frames activity analyzes each strip of the fiche card to locate the boundaries of individual document frames. Its main features include:

  • Low-resolution preview: Generates a preview image of the strip for fast processing and review.
  • Frame detection: Uses binarization and gutter detection to identify the grid of frames on the strip, even when frames are missing or partially obscured.
  • Flagging for review: Strips with detection issues (such as missing or low-intensity frames) are automatically flagged for human review.
  • Configurable detection: Properties such as "Processing Resolution," "Binarization," "Minimum Vertical Length," and "Page Size Range" allow fine-tuning for different fiche layouts and image qualities.

The detected frame information is saved for use in downstream activities.

Clip Frames

The Clip Frames activity crops out each detected frame from the fiche strip and saves it as a separate page. Key features include:

  • Frame cropping: Uses the frame locations detected by the previous activity to extract each frame as an individual image.
  • Tile rotation and padding: Supports rotation correction and configurable padding around each frame to ensure clean crops.
  • Image compression: Allows specification of compression settings for the output images.
  • Error flagging: Flags the Batch Folder if any frames are missing after extraction.

This activity produces a set of clean, individual page images ready for further processing, such as OCR or data extraction.

Microfiche related IP Commands

Extract Page IP Command

The Extract Page IP Command is designed to extract a single page from a larger carrier image, such as a flatbed scan or camera capture. It works by:

  • Edge detection: Locates the four edges of the page, even if the page is skewed or subject to perspective distortion.
  • De-warping: Applies a warp operation to produce a rectangular, de-skewed image of the page.
  • Binarization and diagnostics: Offers configurable binarization and diagnostic outputs to fine-tune extraction for different backgrounds and page conditions.

This command is especially useful for microfiche frames that contain a single page surrounded by a visible border.

Scratch Removal IP Command

The Scratch Removal IP Command is specialized for film-based media, such as microfiche, microfilm, and slides. It automatically detects and removes scratches, which appear as high-intensity white lines or streaks in scanned images. Features include:

  • Automatic detection: Analyzes the image histogram and applies a sensitivity threshold to identify likely scratches.
  • Mask refinement: Filters out non-scratch features based on maximum thickness and minimum weight criteria.
  • Configurable dropout: Applies a dropout method to remove or suppress detected scratches while preserving content.
  • Diagnostics: Generates a "Scratch Mask" image and logs for review and parameter tuning.

Removing scratches improves the quality of the final images and enhances the accuracy of OCR and data extraction.

Microfiche and the Review step: the Fiche Strip Viewer

The Fiche Strip Viewer is a specialized user interface component in Grooper designed for reviewing and validating frame detection results on scanned strips of microfiche cards. It is intended to be used after running the Detect Frames activity, providing operators with a visual and interactive way to confirm or correct the locations of detected frames before further processing.

The Fiche Strip Viewer is implemented as a type of "Task View", which can be included as a tab in the Review activity. Its main functions are:

  • Visual review of frame detection: Displays the scanned strip image along with overlays or highlights indicating the detected frames.
  • Navigation and validation: Allows users to quickly navigate between frames, especially focusing on those that may be invalid or require attention.
  • Correction and confirmation: Enables manual adjustment or confirmation of frame boundaries, ensuring that only valid frames proceed to the next processing steps.

Example Microfiche Batch Process workflow

Below is an example of how these activities and IP commands can be combined in a Batch Process to fully automate microfiche card processing:

  • This is just an example. Your process may include more (possibly less) steps.
  1. Microfiche images are ingested via a scanner or other import mechanism.
  2. Initialize Card – Organizes raw tiles and creates a preview image.
  3. Detect Frames – Locates the frames on each strip and flags any detection issues.
  4. Review - Optionally review Initialize Card and Detect Frames results before Clip Frames executes using the Fiche Strip Viewer.
  5. Clip Frames – Crops out each frame as a separate page.
  6. Image Processing – Cleans up each page image by removing scratches and extracting the page region. Extract Page and Scratch Removal would be included in the IP Profile executed by this step.
  7. Recognize – Performs OCR on the cleaned page images.
  8. Separate - Separates pages into documents.
  9. Classify – Assigns Document Types to each document.
  10. Extract – Extracts fields (i.e. the document's Data Model) from the recognized text.
  11. Export – Outputs the processed documents and data to the desired destination.

This sequence ensures that each microfiche card is converted into a set of high-quality, searchable digital documents, with minimal manual intervention.