Initialize Card

From Grooper Wiki
Revision as of 12:32, 26 December 2019 by Configadmin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Property panel for Initialize Card.

Initialize Card is a batch activity which prepares microfiche scans for further processing.

This is the first step in Grooper’s microfiche processing. Documents on a microfiche card are scaled down images in a matrix of rows and columns on a flat sheet of film. Fiche cards are scanned as a series of smaller tiles, each tile forming part of the whole sheet of film. The Initialize Card activity takes imported tile images and gets them ready for further steps.

About

First, the Initialize Card activity sorts the tiles and organizes them into subfolders by horizontal strips. So, if there are 7 strips with 22 tile images in each strip of the fiche card, you will end up with 7 folders, each containing 22 images. Second, it stitches these individual tiles into a low resolution preview of the entire fiche card.  This preview image is placed on microfiche card's folder.  This gives users a visual reference for the whole card.


Fiche graphic.png


You may notice a tile doesn't always contain a whole document or may contain a whole document and portions of another. That's ok. This is only the first part of getting documents off a microfiche card. After the card is initialized, the scans will need to run through the Detect Frames and Clip Frames activities. By the end of Initialize Card, Detect Frames, and Clip Frames, the tiles are stitched together, individual pages are detected on the card, removed from that card and saved to their own page.


First the activity organizes strips into folders with image tiles inside. Second, a low resolution preview file of the microfiche card is created.
Fiche Tree.png
Fische Preview.png


Version Differences

Initialize Card is a brand new activity in Grooper 2.80. In previous versions, documents from microfiche scans were pre-processed using software local to the scanner. This significantly slows down the speed at which cards can be scanned. Grooper's microfiche capabilities allow the scanner to run at full speed, while Grooper pre-processes them independently. Furthermore, while microfiche scanners do have some image cleanup capabilities, they are nowhere near as robust as Grooper's Image Processing activities. The result is end-to-end microfiche document processing with a faster workflow and cleaner images resulting in more accurate OCR data.

Use Cases

Initialize Card is specifically designed for document processing from microfiche scans. Any set of documents stored on microfiche can take advantage of this activity. Grooper does not currently have support for other microforms (such as microfilm). However, now that the groundwork for microform processing is laid, this functionality will be easier to add in the future. If you'd like to see more microform processing added into Grooper, please let us know!

Microfiche is often used for archiving documents. Court records are one example of documents that may be stored on microfiche.  Government agencies often have archiving regulations that stipulate documents must be preserved on microform.  Microfiche is usually the preferred microform because indexing documents on them is easier than other forms such as microfilm ( Microfilm is usually preferred for documents like books and newspapers as you can simulate turning a page by scrolling through the film).

How To: Configure the Activity

Before you begin
The microfiche card must first be scanned directly into Grooper or otherwise imported.

Set the Ordering Pattern property

This is a regular expression pattern that reads tile image filenames as they were named from the scanner. The result is used to sort them into row and column numbers

1571675299919-318.png The default ordering pattern reads filenames from Mekel brand scanners.  Their scans come in tiles of the microfiche with filenames ending like "A01-01.jpg".  This would correspond to the tile in the first row and the first column.  The regular expression is looking for any capitalized alpha character ("[A-Z]") followed by two digits ("\d\d") followed by a hyphen ("-") followed by a digit or digits of unknown length ("\d+") followed by the .jpg file extension at the end of the filename (\.jpg$).  The <Row> group captures the row number, which in this case is ordered as sequential alpha characters from A to Z (captured by the regular expression "[A-Z]").  The <Column> group captures the column number, which in this case is ordered as sequential digits (captured by the regular expression "\d+").


Define the number of strips and tiles

Set the number of horizontal strips of image tiles forming the fiche card using the Strip Count property. Set how many tiles make up each strip by setting the Tile Count property. These settings will correspond to the number of strips and tiles as the card was scanned into the microfiche scanner.


1573751928803-856.png
1573750232418-754.png
1573751906113-603.png
1573752526455-741.png


Set the Vertical Overscan property

Here you can control the amount of “vertical overscan” included in each strip. This value will come from the vertical overscan setting on the printer itself. During scanning, this adds a set length above and below the scanned strip of microfiche where one strip slightly overlaps the ones around it. This ensures no small gaps between images whenever they are put back together. However, Grooper must know that value in order to stitch the images back together properly.


1571675305894-321.png


Run the activity

At this point, Grooper has what it needs to organize tile images into strip folders within a batch. You may run this activity as part of a Batch Process, or by manually using an Apply Activity command on a batch. The top strip will be the first folder with all its component tile images inside. The second strip will be the second folder, the third will be third and so on and so on.

Property Details

Property Default Value Information
General Properties
Connection Settings (?<Row>[A-Z])\d\d-(?<Column>\d+)\.jpg$ Here, you write a regular expression pattern to determine row and column numbers from image filenames. The filenames themselves come from the microfiche scanner. You will need to determine the scanners naming convention in order to figure out how it is naming tile images according to their row and column position. In the case of the default expression, the row group (?<Row>[A-Z]) is looking for an A to Z character ([A-Z]) before two digits and a hyphen (\d\d-). The column group (?<Column>\d+) is looking for a digit of any length (\d+) before the .jpeg file extension at the end of the file (\.jpg$). Folders are created for each strip A to Z and each strip's tiles are placed in the folders in numerical order.
Strip Count 7 This is where you set the number of strips expected for each microfiche card. You will set the same strip count here as the card was scanned by the scanner.
Tile Count 22 This is the number of tiles expected for each strip, however many individual tile images make up a strip.
Overscan Size 0.1in Here you can control the amount of “vertical overscan” included in each strip. This value will come from the vertical overscan setting on the printer itself. During scanning, this adds a set length above and below the scanned strip of microfiche where one strip slightly overlaps the ones around it. This ensures no small gaps between images whenever they are put back together. However, Grooper must know that value in order to stitch the images back together properly.