Barcode Detection

From Grooper Wiki
Jump to navigation Jump to search
Barcode Detection in an IP Profile

Barcode Detection detects and reads barcode data.

Barcode Detection is an IP Command used to read barcodes on documents. As an IP Command, it can be added as a step in an IP Profile used during an Image Processing activity. When run on a page object, detected barcode information is stored as part of that page's layout data (stored within its LayoutData.json file).

Version Differences

There are no major differences in how to configure Barcode detection before 2.80.  There were, however, some improvements to the Barcode Reader itself.

Use Cases

Barcode.png

Barcodes are used in a variety of ways.  The most common usage is for product labeling, which can be used to track items in inventory or for accounting purposes.  In a healthcare setting, barcodes may be used for patient identification to access patient data, such as medical history or allergy information, or to create SOAP Notes or manage medication.  They also may be used to separate and index documents.

Grooper supports the following barcode symbologies:


Symbology Description
AustraliaPost Australia Post 4-state barcode.
Codabar Codabar is a linear barcode symbology developed in 1972 by Pitney Bowes Corp It and its variants are also known as Codeabar, Ames Code, NW-7, Monarch, Code 2 of 7, Rationalized Codabar, ANSI/AIM BC3-1995 or USD-4.
Code11 Code 11 is a barcode symbology developed by Intermec in 1977. It is used primarily in telecommunications. The symbol can encode any length string consisting of the digits 0-9 and the dash character "-". One or two modulo-11 check digit(s) can be included.
Code128 Code 128 is a high-density barcode symbology used for alphanumeric or numeric-only barcodes. It can encode all 128 characters of ASCII and, by use of an extension character (FNC4), the Latin-1 characters defined in ISO/IEC 8859-1.
Code32 Also known as Italian Pharmacode, IMH, Codice 32 Pharmacode, Codice Farmaceutico Italiano, Radix 32 Barcode.
Code39 Code 39 is a commonly used variable length alphanumeric barcode symbology. Also known as Alpha39, Code 3 of 9, Code 3/9, Type 39, USS Code 39, or USD-3.
Code93 Code 93 is a barcode symbology designed in 1982 by Intermec to provide a higher density and data security enhancement to Code 39. It is an alphanumeric, variable length symbology. Code 93 is used primarily by Canada Post to encode supplementary delivery information. Every symbol includes two check characters.
Datamatrix A Data Matrix code is a two-dimensional matrix barcode consisting of black and white "cells" or modules arranged in either a square or rectangular pattern. The information to be encoded can be text or numeric data. Usual data size is from a few bytes up to 1556 bytes
Ean13 An EAN-13 barcode (originally European Article Number, but now renamed International Article Number even though the abbreviation EAN has been retained) is a 13 digit (12 data and 1 check) barcoding standard which is a superset of the original 12-digit Universal Product Code (UPC) system developed in 1970 by George J. Laurer. The EAN-13 barcode is defined by the standards organization GS1.
Ean8 An EAN-8 is a barcode and is derived from the longer European Article Number (EAN-13) code. It was introduced for use on small packages where an EAN-13 barcode would be too large.
I2of5 Interleaved 2 of 5 (ITF, from Interleaved Two of Five) is a continuous two-width barcode symbology encoding digits. It is used commercially on 135 film, for ITF-14 barcodes, and on cartons of some products, while the products inside are labeled with UPC or EAN.
IntelligentMail The Intelligent Mail Barcode (IM barcode) is a 65-bar code for use on mail in the United States. The term “Intelligent Mail” refers to services offered by the United States Postal Service for domestic mail delivery. The IM barcode is intended to provide greater information and functionality than its predecessors POSTNET and PLANET. An Intelligent Mail barcode has also been referred to as a One Code Solution and a 4-State Customer Barcode, abbreviated 4CB, 4-CB or USPS4CB.
Itf14 ITF-14 (Interleaved Two of Five) is the GS1 implementation of an Interleaved 2 of 5 bar code to encode a Global Trade Item Number. ITF-14 symbols are generally used on packaging levels of a product, such as a case box of 24 cans of soup. The ITF-14 will always encode 14 digits.
MicroQr Micro QR code is a smaller version of the QR code standard for applications where symbol size is limited.
Patch Detects Patch 1, Patch 2, Patch 3, Patch 4, Patch 6, and Patch T patch code scanner sheets. The detected patch code symbology is returned as the barcode value.
Pdf417 PDF417 is a 2D (stacked linear) barcode symbology used in a variety of applications, primarily transport, identification cards, and inventory management. PDF417 is one of the formats (along with Data Matrix) that can be used to print postage accepted by the United States Postal Service. PDF417 is also selected by the airline industry's Bar Coded Boarding Pass standard (BCBP) as the 2D bar code symbolism for paper boarding passes. PDF417 is the standard selected by the Department of Homeland Security as the machine readable zone technology for RealID compliant driver licenses and state issued identification cards. It is also used on FedEx on package labels.
Planet The Postal Alpha Numeric Encoding Technique (PLANET) barcode was used by the United States Postal Service to identify and track pieces of mail during delivery - the Post Office's "CONFIRM" services. It was fully superseded by Intelligent Mail Barcode by January 28, 2013.
Postnet POSTNET (Postal Numeric Encoding Technique) is a barcode symbology used by the United States Postal Service to assist in directing mail.
Plus2 2-digit supplementals associated with EAN and UPC symbology barcodes.
Plus5 5-digit supplementals associated with EAN and UPC symbology barcodes.
Qr QR code (abbreviated from Quick Response Code) is the trademark for a type of matrix barcode (or two-dimensional barcode) first designed for the automotive industry in Japan. The QR Code system became popular outside the automotive industry due to its fast readability and greater storage capacity compared to standard UPC barcodes. Applications include product tracking, item identification, time tracking, document management, and general marketing.
Rm4scc Royal Mail 4-State Customer Code is a barcode symbology used by the Royal Mail for its Cleanmail service.
Rss14 RSS 14 barcode (Reduce Space Symbology) encodes the full 14-digit EAN.UCC item identification in a symbol that can be omnidirectionally scanned by suitably configured point-of-sale laser scanners. RSS-14 alternate formats are read as well as the regular format. The alternate formats are truncated, stacked, and stacked omnidirectional. It is recommended that the alignment of truncated and stacked symbols be close to horizontal or vertical.
RssLimited RSS Limited encodes a 14-digit EAN.UCC item identification with indicator digits of 0 or 1 in a small symbol which will not be scanned at POS. It is recommended that symbols whose height is near to the specified minimum be aligned close to horizontal or vertical.
Telepen Telepen is a name of a barcode symbology designed in 1972 in the UK to express all 128 ASCII characters.
Upca UPC-A. The Universal Product Code (UPC) is a barcode symbology widely used in the United States, Canada, the United Kingdom, Australia, New Zealand, and in other countries for tracking trade items in stores. The most common form, UPC-A, consists of 12 numerical digits, which are uniquely assigned to each trade item.
Upce UPC-E. This symbology differs from UPC-A in that it only uses a 6-digit code. To allow the use of UPC barcodes on smaller packages where a full 12-digit barcode may not fit, a 'zero-suppressed' version of UPC was developed called UPC-E, in which the number system digit and all trailing zeros in the manufacturer code and all leading zeros in the product code are suppressed. Encodations are a compressed form of UPC A. The DataString property holds the full uncompressed UPC A encodation for this symbology.
Aztec Aztec is public domain 2D barcode symbology, formally defined by the ISO/IEC 24778:2008 standard

How To:  Add Barcode Detection to an IP Profile

Before you begin

This guide assumes you've created an IP Profile and have a Test Batch ready to configure the Scratch Removal command.  For this example, we will configure Barcode Detection to read the QR code below.  I wonder what it says...
1574804851350-113.png

Add Barcode Detection to your IP Profile

1. Navigate to your IP Profile in the "IP Profiles" folder in the "Global Resources" folder in the Node Tree.


2. Press the "Add" button to add a new IP Command to your IP Profile.
Add an ip step.png


3. Select the "Feature Detection" heading. Then, select "Scratch Barcode Detection".
1574805238857-222.png

Read the barcode

1. You must set the barcode symbology you are attempting to read on your documents.  This is done using the "Barcode Symbologies" property.  You may select multiple symbologies if you expect to encounter more than one.  You may also select "All" to search for all available symbologies.  Our example is a QR code.  We will check the "Qr" box.
1574805489325-966.png


2. It's entirely possible this is all you'll need to configure for this command to work.  However, there are some additional properties available.  The "Read Direction" property determines which direction Grooper's barcode reader searches the document for barcodes. 
For example, "South" starts at the top of the page and reads down (The reader travels south through the document

East, South, West, North, NorthEast, SouthEast, SouthWest, NorthWest are all supported directions. You may choose to read the document in multiple directions, which can be useful for detecting barcodes with inconsistent orientations (As in, sometimes they're on the document horizontally and sometimes vertically). However, be careful!  Some barcode values change if they are read upside down. For example, A Patch T barcode becomes a Patch 3 barcode when read upside down.


3. The "Read Quality" can also be adjusted.  This setting determines the barcode recognition's performance to accuracy cost ratio. "MostAccurate" is the most accurate but slowest, repeating the search multiple times using different settings. "Fastest" is the fastest but least accurate. "Normal" is somewhere in the middle. If you have the time and computing resources available, "MostAccurate" is preferred.


4. Most often, only one barcode is expected on a page.  However, if you need to read more than one barcodes, adjust the "Barcodes to Read" property to the maximum you expect to find on your documents.


5. The "Scan Interval" setting controls how much of the document is scanned when looking for barcodes.  This sets the number of lines read per scan when looking for a barcode. Setting this to 1 will read every line of the image. Setting this to 10 will scan every 10th line. The lower the number, the more accurate the results will be, but also the slower the processing time.

But before you read the barcode...

There are two "Preprocessing" properties available for Barcode Detection: Binarization Settings and Region of Interest.

1. Binarization is mandatory. Document images must be converted to black and white in order to read the barcode properly.
Barcode detection-binarization.png
Binarization converts color images to black and white by "thresholding" the image. Thresholding is the process of setting a threshold value on the pixel intensity of the original image.  Pixel intensity is a pixel's "lightness" or "brightness".  Essentially, once a midpoint between the most intense ("whitest") and least intense ("blackest") pixel on a page is established, lighter pixels are converted to white and darker are converted to black.  Or put another way, pixels with an intensity value above the threshold are converted to white, and those below the threshold are converted to black.  This midpoint (or "threshold") can be set manually or found automatically by a software application. The Thresholding Method can be set to one of four ways:
  • Simple - Thresholds an image to black and white using a fixed threshold value between 1 and 255.
  • Auto - Selects a threshold value automatically using Otsu's Method.
  • Adaptive - Thesholds pixels based on the intensity of pixels in the local neighborhood.
  • Dynamic - Performs adaptive thresholding, while preserving dark areas on the page.
Each method has its own set of configurable properties. For more information on binarization and these methods, visit the Binarize article.


2. Region of Interest is optional. By default, Barcode Detection will scan the entire page for barcodes. However, this setting can limit where Barcode Detection scans for barcodes. Press the ellipsis button at the end of the property to configure the scanning location.


Barcode detection - region of interest.png


This will bring up the "Edit Zone" window. Here you can use the "Selection Tool" to lasso the portion of the page you wish to restrict where the barcode reader scans for barcodes. For this example, we might want to avoid scanning the QR code at the bottom of the page.


Barcode detection-roi lassoing.png


Only barcodes falling in the selected region will be read. The barcode at the bottom of the page is not in the green rectangle. So, it will not be read.


Barcode detection - roi selection.png

Property Details

Property Default Value Information
General Properties
Barcode Symbologies None Here, you can set which barcode symbology you wish to read from a document. You may select multiple barcode symbologies up to and including all 28 supported by Grooper. See the Use Cases section above for all available symbologies.
Read Direction East, South, West, North This property determines which direction Grooper's barcode reader searches the document for barcodes. For example, "South" starts at the top of the page and reads down (The reader travels south through the document). East, South, West, North, NorthEast, SouthEast, SouthWest, NorthWest are all supported directions. You may choose to read the document in multiple directions, which can be useful for detecting barcodes with inconsistent orientations (As in, sometimes they're on the document horizontally and sometimes vertically). However, be careful. Some barcode values change if they are read upside down. For example, A Patch T barcode becomes a Patch 3 barcode when read upside down.
Read Quality MostAccurate This setting determines the barcode recognition's performance to accuracy cost ratio. "MostAccurate" is the most accurate but slowest, repeating the search multiple times using different settings. "Fastest" is the fastest but least accurate. "Normal" is somewhere in the middle. If you have the time and computing resources available, "MostAccurate" is preferred.
Barcodes to Read 1 Defines the maximum number of barcodes on the page to read.
Scan Interval 5 This sets the number of lines read per scan when looking for a barcode. Setting this to 1 will read every line of the image. Setting this to 10 will scan every 10th line. The lower the number, the more accurate the results will be, but also the slower the processing time.
Preprocessing Properties
Binarization Settings Auto Binarization converts color images to black and white by "thresholding" the image. Thresholding is the process of setting a threshold value on the pixel intensity of the original image. Pixel intensity is a pixel's "lightness" or "brightness". Essentially, once a midpoint between the most intense ("whitest") and least intense ("blackest") pixel on a page is established, lighter pixels are converted to white and darker are converted to black. Or put another way, pixels with an intensity value above the threshold are converted to white, and those below the threshold are converted to black. This midpoint (or "threshold") can be set manually or found automatically by a software application. The Thresholding Method can be set to one of four ways:
  • Simple - Thresholds an image to black and white using a fixed threshold value between 1 and 255.
  • Auto - Selects a threshold value automatically using Otsu's Method.
  • Adaptive - Thesholds pixels based on the intensity of pixels in the local neighborhood.
  • Dynamic - Performs adaptive thresholding, while preserving dark areas on the page.

Each method has its own set of configurable properties. For more information on binarization and these methods, visit the Binarize article.

Region of Interest Defines an optional rectangular area on the page where barcodes are read. Only barcodes fully enclosed in the area drawn will be recognized. The rectangle can be drawn using the "Edit Zone" window. To access this, click on the "Region of Interest" property and press the ellipsis button at the end of the property.
Barcode Validation Options
Enforce Checksum False Checksums are used to verify the correctness of data. The idea is if the whole unit of data can be identified by a single number, a checksum, you can check to see if the parts of that unit of data add up to the checksum. For example, for the data of "CAT" the ASCII values of the letters are 67, 65, and 84. Those three numbers add up to 216. So the checksum would be assigned the number 216. Barcodes use checksums to verify the code was scanned correctly. If each segment of the barcode's data adds up to the checksum, we know the result is valid.


Some barcode symbologies use checksums. Some do not. Some have optional checksums. Some have more advanced methods of accounting for error correction. This property sets whether or not to enforce these values. If set to false, the checksum character in the barcode will just be a character in the encoding string and whatever value is read from the barcode will be accepted.

Quiet Zone Tolerance 50% The Quiet Zone is an area of free space around the barcode. These areas should be clear of graphics, text, or any other marks in order for the barcode to be read properly. Quiet Zone Tolerence defines how strictly the barcode reader respects this zone.


A setting of 0% will ignore the Quiet Zone requirement, where 100% will require it without exception. Settings closer to 100% are good for barcodes that are generally unobstructed. Settings closer to 0% can cause false positive results, but may be necessary for barcodes embedded in a document's content.

Skip All Validation False This bypasses any validation built into a barcode symbology. This can be useful for reading damaged or malformed barcodes, but their data may not be as trustworthy.