Scratch Removal

From Grooper Wiki
Jump to navigation Jump to search

The Scratch Removal IP Command detects and removes or repairs scratches from film-based images.

This IP Command is designed to work on scans from film-based images, such as microform documents. Any time film is is handled, it is in danger of being scratched. If there are documents on that film, those scratches can render text unreadable. The Scratch Removal command works to remove those scratches and repair text on the page when possible. Scratches on film let more light in when scanned. So, they appear as bright spots on the image. The command creates a dropout mask targeting pixels brighter than the calculated "white point" of the image and removes them.


The original scanned image. The dropout mask of all the scratches detected. The final, cleaned up, output.       
Scratch removal.png

Version Differences

Scratch Removal is a new command in version 2.80. Prior to this version, it would have been difficult to repair scratches. Other image cleanup operations, such as Blob Removal and Speck Removal, target black pixels rather than white ones. Theses commands work well for scanned paper images, but would not work for scanned film images. It's possible scratches could be removed through a combination of IP Commands, including inverting the image to turn the white pixels to black and creating a dropout mask that way. However, it would be more complicated, computationally more expensive, and ultimately less successful than Scratch Removal.

Use Cases

Scratch Removal was developed specifically for Grooper's microfiche processing cabailities added in version 2.80. However, this command could be used to repair scans from any film-based media.

Documents on a microfiche card.

How To: Add the command to an IP Profile

Before you begin

This guide assumes you've created an IP Profile and have a Test Batch ready to configure the Scratch Removal command.

Add Scratch Removal to your IP Profile

1. Navigate to your IP Profile in the "IP Profiles" folder in the "Global Resources" folder in the Node Tree.

2. Press the "Add" button to add a new IP Command to your IP Profile.


3. Select the "Feature Removal" heading. Then, select "Scratch Removal".


Using the IP diagnostics panel

Before we begin, let's look at some of the diagnostics available to us. This will be our testing image. So far, only the default properties of the command are being used.


The IP diagnostics panel to the left of a selected page or document shows all the steps in an IP Profile as folders with diagnostic images inside. We only have one step, Scratch Removal. There are two extremely useful diagnostic images we will look at. "Scratch Mask" and "Dropout Mask".


"Scratch Mask" shows every scratch Grooper detected. Scratches are shown as black pixels.


"Dropout Mask" shows which of those detected scratches are going to be removed. You can see some of the scratches detected are not included in our dropout mask. We can fix that once we configure the command's properties.


Select "Output Image" to see the result. You can see here only the scratches included in the "Dropout Mask" were removed from the image.


Configuring scratch detection

The first property we can configure is "Sensitivity". This controls how aggressively scratches will be detected. The default is 20%. The higher the value is the more scratches will be detected. However, the goal is to only produce a mask containing scratches. Seen in the, "Scratch Mask", there's a lot of pixel noise around some of the blobs on the page, and even the text too. Lowering the sensitivity all the way down to 1% removes most of that pixel noise while still keeping those scratches on the page. This will be a property you need to play around with to find the right balance for your document set. Also, other properties can help target scratches over text if you need to keep the sensitivity higher.

Sensitivity set to 20%.
Sensitivity set to 1%.

Next, "Maximum Thickness" determines the largest thickness of what should be considered a scratch. The default thickness is 3pt. As you can see bellow, the "Scratch Mask" shows scratches not present in the "Dropout Mask".

Scratch Mask
Dropout mask

By increasing this property to "5pt" we will get the thicker scratches on the page, as seen in the dropout mask below. Again, the maximum thickness you set here will depend on what you wish to include or exclude as a scratch for your document set.


You can also set the minimum pixel count of a blob by setting the "Minimum Weight" property. Our example has a lot of "noise" around the text. Noise is random variation of brightness in an image. The result here is an almost imperceptible halo of brighter pixels around the text. However, its obvious when you look at it in the diagnostic panel. Luckily for us, these "noisey" pixels are only one or two pixels wide. See below the difference of a single pixel minimum weight to one of two and last one of three.

Minimum Weight = 1px
Minimum Weight = 2px
Minimum Weight = 3px

Configuring scratch removal

The "Dropout Method" property controls how the dropout mask is removed from the image. There are two different methods to resolving scratches. "Fill" and "Inpaint" "Fill" simply removes the scratches by coloring them in to match the background color. "Inpaint" attempts to repair the image by using known pixel information around the dropped out scratches to fill them in. The Inpaint method is designed to match removed pixels to a colored or complex background. Student transcripts are a great example. They often are printed on paper with some kind of patterned background.

Dropped out using "Fill" Dropped out using "Inpaint"
1573677783367-864.png 1573677791873-137.png

Bellow you can see a zoomed in portion showing the difference between using "Fill" and "Inpaint". The default settings were used here.

Filled Inpainted

The Inpaint method also has two different methods of filling pixels: "Telea" and "NavierStokes". "Telea" restores pixels by approximating the value of the removed pixels based on the value of pixels around it.  More or less, if 75% of the pixels around it are white and 25% of the pixels around it are black, the pixel would become white.  The area of known pixels is called a "neighborhood".  You probably think about housing demographics the same way.  Let's say for every house on a block you know their household income but one.  75% of them fall into an "upper class" income bracket.  25% fall into "upper middle class".  While that one house's income level could be upper middle class (or even lower), given most of the houses on the block are upper class, it's safer to assume it is upper class as well.

"NavierStokes" uses equations from fluid dynamics to fill in pixels the same way a fluid would fill a void. Imagine pixel colors bleeding into the empty space the same way a liquid would fill a gap. If you had a grey colored liquid and a black colored liquid filling in a gap, they would compete to fill the space in certain ways. If there's less of the black liquid than grey around the gap, ultimately more of the gap will be filled by grey liquid. Furthermore, the black liquid will pool in the gap closer to concentrations of black liquid around the gap. Filling in pixels works much the same way. First, if there's more grey pixels around the empty space, more of that void is going to be filled by grey pixels. Second, if a black pixel is right next to the empty space, at least part of that space should be filled by black pixels. The goal is to fill in the empty space with at least some black pixels. As well as matching the background color this can also help account for the pixels lost from the text beside it. However, for our example, there was little difference between the "Fill" and "Inpaint" methods.

You can also control the "Inpaint Radius". This property specifies how large the area around the dropped out pixels Grooper is "looking at" to get a picture of how to fill it in. You can really see the difference between "Telea" and "NavierStokes" when configuring this property. "Telea" is looking at the weighted sum of approximations of pixels in the neighborhood to restore pixels. Increasing the Inpaint Radius is going to increase the size of the neighborhood around the pixel to be filled. If we increase the Inpaint Radius to "25px", that much larger radius is going to include more black pixels. You can see the result has some of those scratches filled in with more black than grey. However, since "NavierStokes" uses fluid dynamics, this "greying out" is less pronounced. With the radius being larger, there's more "fluid" to draw from. But at some point, the void of pixels is filled and the "flow" of pixels into the void should stop.

Telea with a 25px Inpaint Radius NavierStokes with a 25px Inpaint Radius

Last, both "Fill" and "Inpaint" methods have a "Mask Dilation Factor" property. This dilates or erodes the mask applied to the image. This is very easily shown using the "Fill" method seen below. A positive dilation will add pixels to the filled dropout mask. A negative dilation will remove them.

No dilation Dilation of "5" Dilation of "-2"

Property Details

Property Default Value Information
Sensitivity 20% This controls how “aggressively” scratches are detected. The larger the percentage, the more scratches are detected. 
  • The goal is to dropout scratches without dropping out text.
Maximum Thickness 3pt The maximum thickness of a blob (a collection of congruent pixels) to be considered a scratch.
  • You will want the thickness to be large enough you dropout the scratches but not too thick to lose text.
Minimum Weight 1px The minimum number of pixels a blob can have to be considered a scratch.
  • This setting will help you control how big blobs can be without dropping out text on the page.
Dropout Method Fill Can be either “Fill” or “Inpaint”. “Fill” simply replaces the dropped out pixels with a given color (It defaults to the background color of the image.). “Inpaint” digitally restores the image by estimating the value of unknown pixels based on the pixels around them.  This can help match a more complex background, one that might be variable or have some sort of pattern (like you often see on student transcripts).  See "Step 4" in the How To section above for more information.