Reading Handprint, Checkmarks, and Forms with FlexiCapture and Vantage

You are here:
< All Topics

ICR – Intelligent Character Recognition

Intelligent Character Recognition

  • Intelligent Character Recognition (ICR) is an extended technology of the optical character recognition (OCR ). While the OCR technology is designed to extract machine-printed characters, the ICR technology retreives information provided as hand-printed characters
  • The ICR technology can extract hand-printed characters that are separated and written as individualcharacters in areas/zones – these areas/zones needs to be specified as fixed fields of a machine readable forms. Alternativelly, they need to be automatically detected.

Example of a form containing hand-printed characters:

icr-form-illu.png

Important note: ICR is not able to extract texts in “cursive handwriting” as in this example:

old-handwriting-illu.png

  • In most cases, the ICR technology is linked to Field Level / Zonal Recognition and forms processing.
  • To enhance the ICR recognition accuracy, it is recommended to use meta data, for example regular expressions, dictionaries or database lookups.

ICR in ABBYY SDKs

The following ABBYY SDKs and products support ICR

  • FineReader Engine
    Since the version 12, Release 3, ICR is as well included in the Linux version. Since the Release 4 of the version 12, it is as well included in the Mac version of FineReader Engine (in lower versions, the ICR technology was only supported in the Windows version.
  • FlexiCapture SDK – this SDK is designed for forms processing and data extraction, ICR and template matching for fixed forms are part of the default feature set. In addition, ABBYY offers this technology as a product in form of the FlexiCapture platform.
  • Cloud OCR SDK – the ABBYY OCR service, allows reading zones that contain hand-printed, separated characters. This online OCR service for developers does not contain any automated template matching technologies, therefor the zones have to be defined when uploading the tasks into the service.

 

 

Recognizing Handprinted Text

Handwritten text can be recognized only if the characters are written separately (“handprinted text”).

Working sample:

Not working sample

Notes

  • Not all recognition languages are available for handprint recognition. The languages which are available for handprint recognition are marked with a special comment in the List of predefined languages.
  • The coordinates of the blocks that contain handprinted text must be specified manually.

Details

Please see the details in “Help” → “Guided Tour” → “Advanced Techniques” → “Recognizing Handprinted Text“.

 

Checkmark Recognition (Optical Mark Recognition – OMR)

What is a checkmark?

A checkmark field is an element on a form – it is usually of a rectangular shape and therefor often called a “check box”.  In this element, the user of a form should make ‘a sign’ to indicate his opinion, decision or selection – a check/tick, an X, a large dot, inking over, or others.

  • ABBYY SDKs, FineReader Engine and FlexiCapture SDK contain technology for recognition of checkmarks and are therefor able to read and process checkmarks. The process of extracting information from checkmarks is called “Optical Mark Recognition” (OMR)
  • The ABBYY’s OMR technology recognizes different types of checkmarks:
    • simple checkmarks
    • grouped checkmarks
    • model checkmarks
    • and even checkmarks with were later corrected by hand
  • The ABBYY OMR delivers a very high accuracy rate of up to 99.995 %

Checkmarks examples:

checkmark_sample02.pngcheckmark_rectangle.png

Technical Implementation of OMR

The ABBYY layout analysis and the underlying recognition technology works with different blocks types, e.g. for

  • Text
  • Pictures
  • Tables
  • Barcodes and also
  • Checkmark Blocks and a Checkmark Group object

checkmark_group.png

The state of a checkmark can be

  • Selected
  • Not selected
  • Checkmark was selected but was corrected later.

To get good recognition results, image preprocessing can/should be applied in this area:

  • InvertImage
  • MirrorImage
  • …etc.

ABBYY FineReader Engine supports different checkmark types:

  • Square
checkmark-type-square.png
  • Empty
checkmark-type-empty.png
  • Circle
checkmark-type-circle.png
  • Custom
checkmark-type-custom.png

Detection of Checkmarks on a page

Checkmark areas cannot be detected automatically by the ABBYY document analyser. Therefor, the developers have to draw/define the “area” via a code and then apply the recognition.

Typically, checkmarks are found on forms. Both, the ABBYY FineReader Engine and the data capture products of the FlexiCaputure portfolio are able to extract values from checkmarks. If there ae very many  different variants of forms, we recomend to use the ABBYY FlexiCapture product line that offers very sofisticated data extraction algorithms, on top of other capabilities such as document separation and classification of forms. The FlexiCapture products are  available as a ready-made solution FlexiCapture as well as in form of a development kit FlexiCapture SDK.

ABBYY FineReader Engine

The following objects properties are available in FineReader Engine:

  • CheckmarkState
    Specifies the state of the checkmark block.
  • CheckmarkType
    Specifies the checkmark type used for recognition.
  • ImageProcessingParams \\Provides access to the set of properties affecting image preprocessing inside the checkmark block.
  • IsCorrectionEnabled
    This property set to TRUE means that checkmark block can be selected and then corrected. The default value is FALSE.
  • IsSuspicious
    This property set TRUE means that the checkmark was recognized uncertainly.

 FlexiCapture & FlexiCapture SDK

The FlexiCapture product line offers different options to deal with different form types:

  • Fixed Forms – for processing of structured forms of the same type. Here the form template (multi or single page) is matched, based on the setup the pre-defined checkmark areas and then processed

  • FlexiLayouts – for processing of different documents, even if they do not have the same structure. This is possible by usage of a very sophisticated approach of a “free form”: The checkmark areas don not need to be defined as fixed areas but it is possible to define them in relation to other areas of the document – for example near some key elements or keywords.  This approach is not tied to a fixed coordinate location – the technology is able to detect the relevant areas on its own.

FlexiCapture product line delivers a GUI for defining the areas of which data should be extracted:

Document Template Editor – detection of checkmars:

fc_fixed_form_templateeditor.png

Illustration of the Data export settings in FlexiCapture:

fc-checkmark-group-export-options.png

OMR in other ABBYY Products

  • The cloud-based document processing service for software applications ABBYY Cloud OCR SDK is also able to process checkmarks when the region is defined and submitted with the image (snippet). Details can be found in the online API documentation processCheckmarkField

 

Field Level / Zonal Recognition (OCR,ICR)

What is Field Level or Zonal OCR

Zonal OCR or Field Level recognition is a special recognition scenario where only small areas of text are recognized – and usually used in other applications. As the extracted information is used in further business processes, the quality of text recognition is very important. The “region of interest” of the relevant text is known in advance – and in this scenario there are typically only a few different document variants.

Difference between Document Conversion scenario and Field Level/Zonal OCR

  • In Field Level OCR scenarios the zones are defined via code/API.
  • As opposite to Field Level recognition, in the classic “document conversion” scenario the automated layout analysis is very important, because the text areas have to be determined first.

ABBYY SDKs – features for a good field level recognition implementation

To get the best text result, the image quality of the snippet has to be of highest possible quality. Following routines might be applied:
  • Auto-detection of page orientation
  • Image despeckling
  • Lines straightening
  • Removing motion blur and ISO noise from digital photos
  • OCR Voting API to provide the developer all details about the OCR result of a text
Since the ABBYY FineReader Engine 10, a Processing Profile FieldLevelRecognition is available with a pre-testet set of OCR engine settings that is opitmized for Zonal OCR

Zonal Recognition in ABBYY Products

  • If the position of the needed text is known in advance, you can use ABBYY FineReader Engine or the cloud service ABBYY Cloud OCR SDK – read more in  ABBYY Cloud OCR SDK BLOG Post.
  • If the position of the text is not known in advance, the ABBYY FlexiCapture SDK is recommended. This toolkit for document classification and data extraction provides FlexiLayout technology that offers a very flexible, sophisticated way to to find relevant data, even on multi-page documents.

Recognizing Handprinted Arabic Digits

ABBYY FineReader Engine does not currently support Arabic ICR. However, recognizing specifically Arabic digits is possible, and this article describes the necessary steps.

For Arabic digits recognition, you need to create a custom language with the alphabet consisting only of 10 digit symbols and set it as the recognition language for every block with digits.

Therefore, to recognize Arabic handprinted digits do the following:

1. Create a new text language using the CreateTextLanguage method of the LanguageDatabase object.
2. Using the LetterSet property of the BaseLanguage object within the TextLanguage object, set the language alphabet containing the following characters: ٠١٢٣٤٥٦٧٨٩.
3. For each block containing handprinted Arabic digits specify recognition parameters via the ITextBlock::RecognizerParams property:

  • Set the TextLanguage property of the RecognizerParams object to the language you created in the previous step.
  • Set the TextTypes property of the RecognizerParams object to TT_Handprinted.
  • If the digits are enclosed in a frame, box, etc., set up the type of marking around the letters in the FieldMarkingType property of the RecognizerParams object. If each digit is written in a separate cell, use also the CellsCount property to set up the number of character cells in the block.

C# sample code:

// Global ABBYY FineReader Engine object
FREngine.Engine engine;
...
// Open an image file
...

// Create a custom language
FREngine.LanguageDatabase languageDatabase = engine.CreateLanguageDatabase();
FREngine.TextLanguage textLanguage = languageDatabase.CreateTextLanguage();
FREngine.LanguagesLanguages ​​= textLanguage.Languages;
FREngine.LanguageLanguages ​​= LanguageLanguages.AddNew ();

// Set the alphabet
baseLanguage.set_LetterSet( FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet, "٠١٢٣٤٥٦٧٨٩" );

// Create a Layout object
FREngine.Layout layout = engine.CreateLayout();

// Set block region
FREngine.Region region = engine.CreateRegion();
region.AddRect( 491, 314, 2268, 404 );

// Create a new block
FREngine.IBlock newBlock = layout.Blocks.AddNew( FREngine.BlockTypeEnum.BT_Text, region, 0 );
FREngine.TextBlock textBlock = newBlock.GetAsTextBlock();
// Set the custom language
textBlock.RecognizerParams.TextLanguage = textLanguage;
// Specify the text type
textBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Handprinted;
// Specify the type of marking around the letters
textBlock.RecognizerParams.FieldMarkingType = FREngine.FieldMarkingTypeEnum.FMT_SimpleText;

// Recognition and export
...

 

Requirements for the NLP module in FlexiCapture 12

The following requirements should be met:

  • The NLP module should be installed on all machines that are involved in direct image processing – the processing server machine and the machines with processing stations, including both machines with verification stations and machines with FlexiCapture developers. It shouldn’t be installed on the application and the database server;
  • The installed NLP module will appear in the list of the installed programs:
  • A new menu item appears at the Project Setup Station in the document definition properties:
  • No new items appear in the FlexiLayout Studio and other developer applications, except the Project Setup Station.
Previous Reading Barcodes with Digitech PaperFlow and PaperVision Capture
Next Using ABBYY Vantage Document Skills
Contact Us for FREE Consultation on Your OCR Project
=
Table of Contents

Title

Go to Top