Barcode Recognition in ABBYY FineReader & FlexiCapture

You are here:
< All Topics

Recognition of Barcodes in ABBYY technologies

ABBYY technology and products can read different barcode types.
The Document Analysis algorithms are able to locate and identify different barcodes on a document page, but of course it is also possible to “draw” a barcode block also via API.
Once the barcode region is defined/detected, it can be recognised. The API provides access to:
  • the coordinates
  • the characters
  • character confidence information
  • start/stop symbols of different barcode types,
    for barcodes of type Code 39 the start/stop symbol is the asterisk “*”
  • The barcode value can then be used for file naming.
A very common scenario is document separation based on barcodes.
    • This feature is implemented in FlexiCapture projects
    • With FineReader Engine, the developers can “cut” the page stream with custom code
    • Separation in FineReader Server
    • Separation in the ABBYY Scan Station (FineReader Server & FlexiCapture)

Tips for working with barcodes

Barcode recognition quality depends on:

  • the barcode print quality
  • settings used in the document scanning process
  • Placement of the barcode when it is manually added

In order for the barcodes to be recognized well, follow these recommendations:

  • A barcode must be separated from other text by a fairly wide white gap.
  • Barcode size and the width of its separate bars or dots must meet the following requirements:
    • The optimal barcode height is more than 10 millimetres. The size of a barcode should be less than A4 size.
    • Barcode height must be higher than the double height of a text line
    • For not-square barcodes, their length must be bigger than their height
    • For 1D barcodes, the width of the thinnest bar in the barcode must be at least 3-5 pixels in terms of pixels of the image
    • For 2D barcodes, the dimensions of their cells should be at least 2×2 pixels, the recommended size is 4×4 pixels or more. Besides, for all 2D barcodes except PDF417, the cells should be square, because barcodes with the prolate cells most likely will be recognized incorrectly
  • Compressing images of barcodes using JPEG compression should be avoided, because it makes the barcode borders fuzzy.
  • Skewing barcodes is not recommended.
  • The grey-scale scanning mode is the best for OCR purposes.
    • Scanning in black-and-white can cause issues, please adjust the brightness setting.
    • If the barcode is “torn” or very light, decrease the brightness to make the image darker.
    • If the barcode is distorted or its parts are glued together, increase the brightness to make the image brighter.
  • Avoid printing barcodes in frames.
  • Avoid printing barcodes over a text or a picture.

Barcodes that do not fit these recommendations can still be recognized, however the quality of recognition may be lower.

Predefined processing profile for fast implementation

Some products, for example the ABBYY FineReader Engine (starting with version 10) provide a predefined processing profile for recognition of barecodes that detects barecodes and extracts their values, while texts, pictures, or tables are not detected. This profile contains settings that are optimized for barcoderecognition.

 

Barcode types supported by ABBYY technologies

Barcode TypeDescription
AztecAztec is a high-density two-dimensional matrix style bar code symbology that can encode up to 3832 digits, 3067 alphanumeric characters, or 1914 bytes of data. The symbol is built on a square grid with a bulls-eye pattern at its center.
CodabarCodabar is a self-checking, variable length barcode that can encode 16 data characters. It is used primarily for numeric data, but also encodes six special characters. Codabar is useful for encoding dollar and mathematical figures because a decimal point, plus sign, and minus sign can be encoded.
Code 128Code 128 is an alphanumeric, very high-density, compact, variable length barcode scheme that can encode the full 128 ASCII character set. Each character is represented by three bars and three spaces totaling 11 modules. Each bar or space is one, two, three, or four modules wide with the total number of modules representing bars an even number and the total number of modules representing a space an odd number. Three different start characters are used to select one of three character sets.
Code 39Code 39, also referred to as Code 3 of 9, is an alphanumeric, self-checking, variable length barcode that uses five black bars and four spaces to define a character. Three of the elements are wide and six are narrow.
Code 93Code 93 is a variable length bar code that encodes 47 characters. It is named Code 93 because every character is constructed from nine elements arranged into three bars with their adjacent spaces. Code 93 is a compressed version of Code 39 and was designed to complement Code 39.
Data MatrixData Matrix is a two-dimensional matrix barcode consisting of black and white modules arranged in either a square or rectangular pattern. Every Data Matrix is composed of two solid adjacent borders in an “L” shape and two other borders consisting of alternating dark and light modules. Within these borders are rows and columns of cells encoding information. A Data Matrix barcode can store up to 2335 alphanumeric characters.
EAN 8 and 13The European Article Numbering (EAN) system is used for products that require a country origin. This is a fixed-length barcode used to encode either eight or thirteen characters. The first two characters identify the country of origin, the next characters are data characters, and the last character is the checksum. These barcodes may include an additional barcode to the right of the main barcode. This second barcode, which is usually not as tall as the primary barcode, is used to encode additional information for newspapers, books, and other periodicals. The supplemental barcode may either encoded 2 or 5 digits of information.
IATA 2 of 5IATA 2 of 5 is a barcode standard designed by the IATA (International Air Transport Association). This standard is used for all boarding passes.
Industrial 2 of 5Industrial 2 of 5 is numeric-only barcode that has been in use a long time. Unlike Interleaved 2 of 5, all of the information is encoded in the bars; the spaces are fixed width and are used only to separate the bars. The code is self-checking and does not include a checksum.
Interleaved 2 of 5Interleaved 2 of 5 is a variable length (must be a multiple of two), high-density, self-checking, numeric barcode that uses five black bars and five white bars to define a character. Two digits are encoded in every character; one in the black bars and one in the white bars. Two of the black bars and two of the white bars are wide. The other bars are narrow.
Matrix 2 of 5Standard 2 of 5 is self-checking numeric-only barcode. Unlike Interleaved 2 of 5, all of the information is encoded in the bars; the spaces are fixed width and are used only to separate the bars. Matrix 2 of 5 is used primarily for warehouse sorting, photo finishing, and airline ticket marking.
PatchA pattern of horizontal black bars separated by spaces. Typically, a patch code is placed near the top center of a paper document to be scanned and used as a document separator.
PDF417PDF417 is a variable length, two-dimensional (2D), stacked symbology that can store up to 2710 digits, 1850 printable ASCII characters or 1108 binary characters per symbol. PDF417 is designed with selectable levels of error correction. Its high data capacity can be helpful in applications where a large amount of data must travel with a labeled document or item.
PostNetThe Postnet (Postal Numeric Encoding Technique) is a fixed length symbology (5, 6, 9, or 11 characters) which uses constant bar and space width. Information is encoded by varying the bar height between the two values. Postnet barcodes are placed on the lower right of envelopes or postcards, and are used to expedite the processing of mail with automatic equipment and provide reduced postage rates.
QR CodeQR Code is a two-dimensional matrix barcode. The barcode has 3 large squares (registration marks) in the corners which define the top of the barcode. The black and white squares in the area between the registration marks are the encoded data and error correction keys. QR Codes can encode over 4000 ASCII characters.
UCC-128This type of barcode is a 19 digit barcode with a 20th check digit. For a total of 20 digits. It typically is used for carton identification. Both for internal carton numbering and also for using the UCC-128 barcode on your cartons being shipped out to your customers.
UPC-AThe UPC-A (Universal Product Code) barcode is 12 digits long, including its checksum. Each digit is represented by a seven-bit sequence, encoded by a series of alternating bars and spaces. UPC-A is used for marking products which are sold at retail in the USA.
UPC-EThe UPC-E barcode is a shortened version of UPC-A barcode. It compresses the data characters and the checksum into six characters. This bar code is ideal for small packages because it is the smallest bar code.

 

Barcode Requirements

Common recommendation for barcodes to be processed properly: barcodes should comply with an appropriate barcode specification. That means if a barcode is created in accordance to a specification corresponding to its type,

Barcode recognition quality depends on barcode print quality and scanning settings. Below are some recommendations for barcodes to be recognized correctly:

  • Barcode must be separated from other text by a fairly wide white gap.1.png
  • Barcode size and the width of its separate bars or dots must meet the following requirements:
    • The optimal barcode height is more than 10 millimeters. The size of a barcode should be less than size A4
    • Barcode height must be bigger than the double height of a text line2.png
    • For not-square barcodes, their length must be bigger than their height3.png
    • For 1D barcodes, the width of the thinnest bar in the barcode must be at least 3-5 pixels in terms of pixels of the image4.png
    • For 2D barcodes, the dimensions of their cells should be at least 2×2 pixels, the recommended size is 4×4 pixels or more. Besides, for all 2D barcodes except PDF417, the cells should be square, because barcodes with the prolate cells most likely will be recognized incorrectly
  • We do not recommend compressing images of barcodes using JPEG compression as it makes barcode borders fuzzybc.png
  • We do not recommend skewing barcodes, i.e. an angle of the barcode should be a multiple of 90 degrees relative to the horizontal axisrotation.png
  • The grayscale scanning mode is the best for OCR purposes. When scanning in black-and-white, adjust the brightness setting. If the barcode is “torn” or very light, lower the brightness to make the image darker. If the barcode is distorted or its parts are glued together, increase the brightness to make the image brighter.
  • Avoid printing barcodes in framesframes.png
  • Avoid printing barcodes over a text or a picture.patterns.png

In some cases, barcodes which do not fit these recommendations also can be recognized, but the quality of recognition may be poor.

 

How to set up barcode separation on the scanning station

In this video, you will learn how to use FineReader Server to easily split documents by using barcodes.

In this scenario, a single file contains multiple documents. FineReader Server uses the barcodes on each first page to group and separate the pages into documents. Simple and efficient processing:

  1. Select your batch options and the barcode type used.
  2. Scan your document or upload from your computer. You can even take a photo from a mobile device and use predefined or manual editing to create a document from the photo.
  3. Send the file to FineReader Server and it will do the work for you.

FineReader Server automatically converts large collections of documents into searchable, sharable digital libraries. See how our server-based OCR and conversion feature converts scanned and electronic documents into PDF, /A, Microsoft Word, or other formats for search, long-term retention, collaboration, or additional processing – quickly, accurately, and automatically.

 


How to split the pages flow by multiple barcode types in FineReader Server 14

In FineReader Server 14 you have a possibility to split the pages flow by multiple barcode types using regular expressions.

For example, the following regular expression will allow you to split pages by barcodes starting with HY and barcodes starting with GT:

(HY[0-9]+)|(GT[0-9]+)

Read more about Regular Expressions.

To set this option up, go to workflow properties, choose the option Start new document after barcoded page and type in regular expressions for barcodes that you need to split the pages with:

mceclip0.png

Tip: the documents will be named according to barcodes that were used to split them if you choose the naming rule in the Output tab:

mceclip1.png

 

Previous API Services in FineReader Server
Next Creating forms optimized for handprint recognition
Contact Us for FREE Consultation on Your OCR Project
=
Table of Contents

Title

Go to Top