2.80:Microfiche Processing (Concept): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 4: Line 4:


In version 2.80, Grooper added capabilities to process scans from microfiche scanners.  Grooper’s microfiche processes have several advantages over typical microfiche scanning.  First, new Batch Processing Activities detect document frames and digitally cut the document from the fiche card.  This allows the microfiche scanner to run at the fastest possible setting while Grooper does the work to get the documents off the card.  2.80 also adds some microfiche specific image processing capabilities on top of Grooper’s impressive image cleanup operations.  The result is faster end-to-end microfiche processing with far superior image quality compared to anything else on the market.
In version 2.80, Grooper added capabilities to process scans from microfiche scanners.  Grooper’s microfiche processes have several advantages over typical microfiche scanning.  First, new Batch Processing Activities detect document frames and digitally cut the document from the fiche card.  This allows the microfiche scanner to run at the fastest possible setting while Grooper does the work to get the documents off the card.  2.80 also adds some microfiche specific image processing capabilities on top of Grooper’s impressive image cleanup operations.  The result is faster end-to-end microfiche processing with far superior image quality compared to anything else on the market.
== Glossary ==
<u><big>'''AND'''</big></u>: {{#lst:Glossary|AND}}
<u><big>'''Batch Process'''</big></u>: {{#lst:Glossary|Batch Process}}
<u><big>'''Clip Frames'''</big></u>: {{#lst:Glossary|Clip Frames}}
<u><big>'''Detect Frames'''</big></u>: {{#lst:Glossary|Detect Frames}}
<u><big>'''Extract Page'''</big></u>: {{#lst:Glossary|Extract Page}}
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
<u><big>'''Initialize Card'''</big></u>: {{#lst:Glossary|Initialize Card}}
<u><big>'''IP Command'''</big></u>: {{#lst:Glossary|IP Command}}
<u><big>'''Microfiche Processing'''</big></u>: {{#lst:Glossary|Microfiche Processing}}
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
<u><big>'''Recognize'''</big></u>: {{#lst:Glossary|Recognize}}
<u><big>'''Review'''</big></u>: {{#lst:Glossary|Review}}
<u><big>'''Scratch Removal'''</big></u>: {{#lst:Glossary|Scratch Removal}}


== What is microfiche? ==
== What is microfiche? ==
Line 73: Line 46:
* [[Scratch Removal]]
* [[Scratch Removal]]


[[Category:Articles]]
== Glossary ==
<u><big>'''AND'''</big></u>: {{#lst:Glossary|AND}}
 
<u><big>'''Batch Process'''</big></u>: {{#lst:Glossary|Batch Process}}
 
<u><big>'''Clip Frames'''</big></u>: {{#lst:Glossary|Clip Frames}}
 
<u><big>'''Detect Frames'''</big></u>: {{#lst:Glossary|Detect Frames}}
 
<u><big>'''Extract Page'''</big></u>: {{#lst:Glossary|Extract Page}}
 
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
 
<u><big>'''Initialize Card'''</big></u>: {{#lst:Glossary|Initialize Card}}
 
<u><big>'''IP Command'''</big></u>: {{#lst:Glossary|IP Command}}
 
<u><big>'''Microfiche Processing'''</big></u>: {{#lst:Glossary|Microfiche Processing}}
 
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
 
<u><big>'''Recognize'''</big></u>: {{#lst:Glossary|Recognize}}
 
<u><big>'''Review'''</big></u>: {{#lst:Glossary|Review}}
 
<u><big>'''Scratch Removal'''</big></u>: {{#lst:Glossary|Scratch Removal}}

Revision as of 09:53, 27 August 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252.80

Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.

In version 2.80, Grooper added capabilities to process scans from microfiche scanners. Grooper’s microfiche processes have several advantages over typical microfiche scanning. First, new Batch Processing Activities detect document frames and digitally cut the document from the fiche card. This allows the microfiche scanner to run at the fastest possible setting while Grooper does the work to get the documents off the card. 2.80 also adds some microfiche specific image processing capabilities on top of Grooper’s impressive image cleanup operations. The result is faster end-to-end microfiche processing with far superior image quality compared to anything else on the market.

What is microfiche?

Microfiche is a flat piece of film, called a card, containing scaled down reproductions of documents. These documents can be viewed through a microfiche reader, which magnifies them to readable proportions. The purpose of microfiche is to store a large number of documents in a small amount of space while providing access to the documents without distributing the originals. Another great medium checks all those boxes, digital. Furthermore, microfiche is intended to be a permanent archive. However, film degrades over time, and every time it’s handled, the film is in danger of being scratched or otherwise damaged. Also, the number of people who can access documents on microfiche is limited to the number of copies of that card on hand. Digitizing microfiche cards resolves both these limitations.

Grooper's microfiche processing ability was originally developed for Mekel-brand microfiche scanners. Because of the nature of microfiche, it is likely that these capabilities are generalizable to other scanners, and possibly other film-based media.

The activities in this process make a distinction between:

  • frames, which are individual document images on fiche cards, and
  • tiles, which are sections of the microfiche card generated by microfiche scanners.

The main thing to remember is tiles are wider than frames. As such, tiles do not necessarily contain full frames. In other words, a tile might not (and probably won't) have a full document image on it. Grooper's microfiche processing will stitch those tiles back together and extract the frame, generating individual document images from the full card.

Steps

An example of a Batch Process for processing microfiche cards.

Microfiche processing happens in six main steps:

1. Full microfiche cards are imported into Grooper, either via a microfiche scanner or other import operation. 2. Run the Initialize Card activity. Microfiche scans are organized into folders by full card. A low-resolution preview image of the full card is also generated.

  • The preview image may be OCR'd (via the Recognize activity), have data extracted from it, and reviewed just like any other document image.

3. Run the Detect Frames activity. Individual frames surrounding documents on the card are detected.

  • A Review activity can be configured at this point for review and correction using the Fiche Strip Viewer.

4. Run the Clip Frames activity. The documents are clipped from the detected frames, generating one image per page on the fiche card.

  • The Remove Level activity is often used at this point as a "cleanup" activity for the batch structure. This activity removes one or more folder levels in the batch. For example, it can be used to remove the initial folder level created from the the Initialize Card activity (removing the low resolution preview of the full card on that folder at the same time).

5. Film specific image processing commands, such as Extract Page and Scratch Removal, and other IP commands, such as Contrast Stretch, are applied to prepare them for other Grooper activities. 6. And, it’s off to the Grooper races. These images are now document images just like any other as far as Grooper is concerned. You can get OCR data off them, separate the images into document folders, classify them and extract any data you want.

Microfiche Activities

Microfiche Related IP Commands

Although not specifically limited to microfiche processing, these commands were developed specifically for microfiche processing in Version 2.8.

Glossary

AND: AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Clip Frames: view_module Clip Frames is a specialized Activity for processing microfiche in Grooper. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Detect Frames: view_module Detect Frames is a specialized Activity for processing microfiche in Grooper. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Extract Page: Extract Page is an IP Command that removes an image from a carrier image while simultaneously removing any image warping or skewing.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Initialize Card: view_module Initialize Card is a specialized Activity for processing microfiche in Grooper. It prepares and configures microfiche card images for further processing.

IP Command: IP Commands specify an image processing (IP) operation (such as image cleanup, format conversion or feature detection) and are used to construct image IP Steps in an IP Profile. IP Commands are configured using an IP Step's Command property.

Microfiche Processing: Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.

OCR: OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Scratch Removal: Scratch Removal is an IP Command detects and removes or repairs scratches from film-based images.