Search

Introduction to Optical Character Recognition Software

Optical Character Recognition

During your foray into the world of document scanning, you've probably come across the term "OCR". You may even know that it stands for "Optical Character Recognition". But what is OCR, really, and what do you need to know about it to make the best use of this sophisticated and valuable tool?

We're here to give you a run-down on Optical Character Recognition, answer any questions you might have, and recommend the best OCR software for your scanning project. Let's begin!

What is OCR?

The primary purpose of Optical Character Recognition is to quickly and automatically convert scanned images of machine-printed (typed) text - which to a computer are no more meaningful a collection of pixels than any other image, such as a landscape photo - into actual text data that you can search through and modify. The exact mechanics of this process are complicated, but suffice to say that an OCR engine will look at pixel data and search for patterns resembling letters, numbers, and other symbols and create a digitized record of these symbols.

Types of OCR

There are two major types of Optical Character Recognition:

Full Page OCR - Converts the entire page into one of the below mentioned formats:

  • Plain Text - Only basic text information on the page is retained in a consecutive order.
  • Formatted Text - Text information is retained in consecutive paragraphs, saving font size and style. This can also conserve tables in a tabular format, such as spreadsheets.
  • Exact Copy - All information on the page is retained, including graphics, and placed on the page in such as way as to most closely recreate the original document.
  • Searchable File - Text information is retained on a hidden layer behind the scanned, image, allowing the file to be searched while retaining the appearance of the original.

Zone OCR - Recognizes strings of text located on particular areas of the page. This is usually for the purpose of indexing and document management. The information can be used to name a file, save it to a particular location, or archive particular pieces of data into an organized format, such as a database.

Levels of OCR Software
OCR - Optical Character Recognition

OCR Software comes in many different types, which vary in price range based on their features, speed, and accuracy. For instance, you can get a freeware such as SimpleOCR that will serve in a pinch, but it will only be able to convert BMP, JPG, and TIF images of English or French text into plain text documents of TXT or DOC format, one page at a time.

On the other hand, you can invest a few hundred dollars in a Batch OCR or even Server OCR software that will be able to watch particular folders for incoming documents in a variety of image formats and languages, then automatically recreate exact copies of all of the pages therein in a format of choice.

You can also find Desktop OCR software, which will bridge the price gap and include many of the features of the Corporate editions but still require some user input during conversion.

Improving Accuracy
Document Scanners

Although some OCR engines are better than others, no software can guarantee 100% accuracy. This is because there are other factors in play, including scan quality. Recognition software will not be able to do its work if the scanner is not properly digitizing the page.

It is recommended to scan at a resolution of 300dpi for best results. Black & White (Bitonal) is preferred over Greyscale or Color modes, and although most modern scanners are fairly well configured out of the box, you may want to adjust your Brightness and Contrast settings for your particular documents.

If you do not have a scanner that has the necessary speed, quality, or other features that you require to scan your documents, you can always find a large selection of scanners at ScanStore!
ScanStore even has a handy scanners guide to help you find the perfect scanner for your specific requirements and price range.

Limitations of OCR

OCR software is also limited in what it is able to recognize. Most OCR software are only designed to recognize machine printed text, as opposed to handwriting. While there are ICR software that can recognize handwritten information, they tend to be enterprise level solutions for forms processing work, rather than full page recognition.

Similarly, most OCR software are only able to convert traditional machine fonts, not cursive scripts or calligraphy. There are many fonts out there, and OCR engines depend on common, separated letter shapes to recognize the text, so fonts that are unusual or flow together will not be recognized.

OCR Software Guide

ScanStore Logo

There are several OCR (Optical Character Recognition) software solutions available to convert scanned images to text, Word, Excel, HTML or searchable PDF. The differences between them can often be obscure, leaving many to wonder why some OCR software cost about $100 while others cost $500 or more.

Nuance Logo ABBYY Logo IRIS Logo

The main features that differentiate OCR software are:

  • Character recognition accuracy
  • Page layout reconstruction accuracy
  • Support for languages
  • User interface design
  • Output file formats (Word, Excel, PDF, eBook, etc.)
  • OCR speed and support for multi-core CPUs
  • Batch processing modes
  • Advanced PDF encryption or compression
  • Special features for niche projects

Because of the infinite combinations of document types, OCR engines, project requirements and special features, it may be possible that one engine will perform better with your particular documents than another. Use our handy OCR feature comparison chart to determine which OCR program best meets your requirements. ScanStore provides demo downloads for most OCR software with your ScanStore User Account if you prefer to try before you buy.

OCR Software Categories

The OCR software guide is divided into the following categories:
Applications | OCR Servers | MAC OCR | PDF Converters | Personal | Hebrew/Arabic/Farsi OCR | Chinese/Japanese/Korean/Thai OCR

And The Winner Is...

FineReader Box

The OCR experts at ScanStore have tested the latest versions of FineReader, OmniPage, ReadIRIS, CVision PDF Compressor, and SimpleOCR and we consider ABBYY FineReader the best overall value for business users, while ReadIRIS is the best OCR software for under $150.

The key deciding factors were:

  • User interface design
  • Page layout reconstruction capabilities
  • Extensive language support
  • Engine stability when processing large files
  • Availability and quality of technical support

Though other testing labs have ranked OmniPage's overall accuracy slightly higher, we find the difference is nearly negligible. All modern OCR software has very good accuracy, so we recommend going with the one that has particular special features like ReadIRIS Corporate's CardIRIS, FineReader's camera OCR and screenshot reader, or OmniPage Ultimate's form data collection, auto-redaction and barcode filing capabilities.

If you would like to try them out yourself, ScanStore offers free demo downloads for ReadIRIS and FineReader. Nuance does not provide demos for its OCR products.

Businesses with many documents to process should use our SimpleIndex batch document scanning software with the FineReader OCR engine to scan and OCR large batches of documents. Barcode and OCR can also be used to sort and file documents into folders, databases or SharePoint.

OCR Data Capture Solutions

Forms Processing ICR Clipart

OCR can also be used to automate data entry from forms, surveys, invoices and other documents. Handwriting recognition (ICR) solutions are also available. For more information, check out these links:

Compare OCR Software


SimpleOCR Freeware ABBYY FineReader Pro 11 ABBYY FineReader Corporate 11 IRIS ReadIRIS Pro 14 IRIS ReadIRIS Corporate 14 Nuance OmniPage Ultimate SimpleIndex Desktop 7
Scanner Drivers Supported TWAIN TWAIN TWAIN TWAIN TWAIN TWAIN /
ISIS
TWAIN /
ISIS
Table/Spreadsheet Recognition

PDF Password Support


Searchable PDF Output
Highly Compressed PDF Output
MRC MRC iHQC iHQC MRC
Vertical Text Recognition



Barcode Recognition
Image Pre-processing
Watched / Hot Folder


Batch processing

Managed server processing






Indexing




Business Card Recognition



Box version only

Screenshot reader




Zone Templates
Proofing & Training


Languages Supported 3 189 189 128 137 137 179
Arabic/Farsi/Hebrew Hebrew & Arabic Version Hebrew & Arabic Version


Chinese/Japanese/Korean

User Dictionaries


Page Limit per Document 1 none none 50 none none none
Multi-Core Support

Installation Desktop Desktop Desktop /
Server
Desktop Desktop Desktop Desktop
License Freeware Standalone Per Seat /
Concurrent
Standalone Standalone Standalone
3 Licenses
Standalone
Product Information More Info More Info More Info More Info More Info More Info More Info

ABBYY PDF Transformer + CVision PDF Compressor 4 Desktop CVision PDF Compressor 4 Pro
Scanner Drivers Supported N/A N/A N/A
Table/Spreadsheet Recognition
PDF Password Support
Searchable PDF Output
Highly Compressed PDF Output MRC MRC MRC
Vertical Text Recognition
Barcode Recognition
Image Pre-processing
Watched / Hot Folder


Batch processing

Managed server processing


Indexing


Business Card Recognition


Screenshot reader

Zone Templates
Proofing & Training


Languages Supported 184 60 60
Arabic/Farsi/Hebrew Hebrew

Chinese/Japanese/Korean


User Dictionaries
Page Limit per Document none 100 none
Multi-Core Support
Installation Desktop Desktop Desktop
License Standalone Standalone Standalone
Product Information More Info More Info More Info

ABBYY Recognition Server IRISDocument Server PaperVision Capture OCR Digitech OCRFlow CVision Maestro 5 SimpleIndex 7
Scanner Drivers Supported TWAIN /
ISIS
N/A TWAIN /
ISIS
ISIS N/A TWAIN /
ISIS
Table/Spreadsheet Recognition

PDF Password Support

Searchable PDF Output
Highly Compressed PDF Output MRC iHQC



Vertical Text Recognition



Barcode Recognition
Image Pre-processing
Watched / Hot Folder

Batch processing
Managed server processing
Indexing
Business Card Recognition





Screenshot reader





Zone Templates

Proofing & Training


Languages Supported 191 137 121 1 60 179
Arabic/Farsi/Hebrew Hebrew & Arabic Version 2 Add-ons



Chinese/Japanese/Korean Asian Add-on



User Dictionaries

Page Limit per Document none none none none none none
Multi-Core Support Multi-CPU Add-on

Installation Server Desktop /
Server
Desktop /
Server
Desktop Server Desktop /
Server
License Volume-based Core-based Named /
Concurrent
Standalone Standalone Standalone
Product Information More Info More Info More Info More Info More Info More Info

Buy OCR Software

Use the
buttons to get your OCR software download instantly after you order!

Personal OCR Software

SimpleOCR

Our own freeware OCR application provides acceptable accuracy for those who just need to convert a few pages and can't justify the cost of commercial OCR software. Developers can use the command-line and SDK versions to integrate SimpleOCR with their custom applications.

Desktop OCR Software

ABBYY FineReader 12 Professional
ABBYY FineReader

FineReader Professional is a highly accurate and easy to use OCR software that includes host of features including digital camera OCR, intelligent document layouts, image enhancement, barcode recognition, and command line integration. FineReader is our pick for OCR software because its document layout retention will save you much time in reformatting documents you convert for editing.

ABBYY FineReader 12 Corporate

FineReader Corporate Edition offers unique concurrent licensing that makes it possible for many users who need occasional use of OCR to share a small pool of active licenses. With accuracy comparable to OmniPage, superior technical support services, and a user interface that many users find preferable, we think that FineReader Corporate is the best choice of OCR software for business.

IRIS ReadIRIS 14 Pro
IRIS ReadIRIS

Affordable OCR software for business and home users. ReadIRIS Pro provides a very accurate OCR recognition rate at a low cost, but still has some of the advanced features that higher priced professional OCR software includes. The main limitation is that the Pro version is limited to documents under 50 pages.

IRIS ReadIRIS 14 Corporate

Adds support for files over 50 pages, business card recognition, as well as automatic processing of hot folders.

Nuance OmniPage Ultimate
Nuance OmniPage

OmniPage Ultimate has several unique features that make it stand out for a variety of applications. Some of these include auto-redaction, SharePoint integration, automatic filing with barcodes, PDF auto-bookmarking, form data collection and MFP support. Most of these new features are not available in the Standard edition.
Compare OmniPage Versions.

Arabic, Farsi, & Hebrew OCR Software

ABBYY FineReader 12 Professional

FineReader Professional 12 supports Hebrew and Arabic character recognition.

ABBYY FineReader 12 Corporate

FineReader Corporate also supports Hebrew and Arabic character recognition.

ABBYY Recognition Server

Arabic and Hebrew language recognition is included in the newest version of ABBYY Recognition Server.

IRIS ReadIRIS 14 Pro

ReadIRIS 14 Pro now includes Arabic (PC version only), Farsi, and Hebrew character recognition in their base package. No special version or add-on is required.

IRIS ReadIRIS 14 Corporate

Adds the ability to recognize files over 50 pages, business cards and monitor a hot folder to automatically process images in the background.

IRISDocument Arabic Language Pack

Enables Arabic and Farsi character recognition in the IRISDocument high-volume server OCR solution.

Chinese, Japanese, Korean, & Thai OCR Software

ABBYY FineReader 12 Professional

FineReader Professional 12 includes Chinese, Japanese, and Thai languages in their base package. No special version or add-on is required.

ABBYY FineReader 12 Corporate

FineReader Corporate includes Chinese, Japanese, and Thai character recognition in their base package. No special version or add-on is required.

ABBYY Recognition Server

Add-on license is available for ABBYY Recognition Server to add Chinese, Japanese & Korean (CJK) language support. Thai character recognition language pack is also available, but is sold separately from CJK.

IRIS ReadIRIS 14 Pro

ReadIRIS 14 Pro now includes Japanese, Traditional Chinese, Simplified Chinese and Korean character recognition in their base package. No special version or add-on is required.

IRIS ReadIRIS 14 Corporate

ReadIRIS Corporate version is the same as Pro but with the ability to recognize files over 50 pages, business cards and monitor a hot folder to automatically process images in the background.

IRIS ReadIRIS 14 Pro for Mac

ReadIRIS 14 Pro for Mac now includes Japanese, Traditional Chinese, Simplified Chinese and Korean character recognition in their base package. No special version or add-on is required.

IRIS ReadIRIS 14 Corporate for Mac

ReadIRIS Corporate version for Mac is the same as Pro but with the ability to recognize files over 50 pages, business cards and monitor a hot folder to automatically process images in the background.

IRISDocument Server Asian Language Pack

Add-on that enables Chinese, Japanese and Korean character recognition in the IRISDocument server OCR solution.

Apple / Mac OCR Software

ABBYY FineReader 12 Professional for Mac

Creates editable, searchable files and e-books from scans, PDFs and digital photographs. The most accurate OCR available for OSX, its unmatched recognition and conversion eliminates retyping and reformatting. Sophisticated yet remarkably intuitive, FineReader has an easy-to-use interface that makes even the most complex tasks simple.

IRIS ReadIRIS 12 Pro for Mac

Affordable OCR software for Macintosh users using the latest version of the IRIS OCR engine.

IRIS ReadIRIS 12 Corporate for Mac

Adds support for files over 50 pages, business card recognition, as well as automatic processing of hot folders.

PDF Converters

ABBYY PDF Tranformer+
ABBYY PDF Transformer +

ABBYY PDF Transformer's intuitive, versatile, multilingual tool enables you to easily convert any type of PDF into editable formats with the original layout and formatting retained.

Nuance PowerPDF
Nuance PDF Converter

Nuance’s Power PDF solutions deliver intuitive design and an impressive array of features that allows users to work the way they always have, except with greater productivity.

CVision PDF Compressor Desktop
CVision PDF Compressor

PdfCompressor Desktop Edition (OCR) is a more economical version of PdfCompressor Professional (OCR), designed for lower-volume users. This version requires files to be processed individually and files must not exceed 100 pages. An excellent choice for someone who needs the power, but not the volume.

CVision PDF Compressor Professional

PdfCompressor produces the most efficient image documents possible for high volume scanning environments by combining highly accurate OCR, advanced file compression, and batch PDF conversion. PdfCompressor can compress scans by a factor of 10-100, enabling documents to be stored, transmitted, accessed, and hosted more efficiently and less expensively.

Server OCR Software

SimpleIndex Batch OCR
SimpleSoftware SimpleIndex

SimpleIndex combines ABBYY FineReader OCR technology with powerful pattern matching features to extract useful data from OCR text and use it to file documents automatically. Perfect for small to mid-sized businesses that need to digitize many documents at once. Also supports other labor-saving technologies like barcode recognition, zone OCR, and database lookups.

OCR results can be saved to text, MS Word or searchable PDF and PDF/A files. Data can be saved to CSV (Excel), any SQL database, embedded in folders and filenames or used as file SharePoint 2010 metadata.

Affordable desktop and server licensing with no pay-per-click makes SimpleIndex the most cost effective software of its kind!

ABBYY Recognition Server
ABBYY Recognition Server

Innovative server-based OCR software for performing centralized enterprise-wide OCR processing. Processor license allows anyone on the network to submit files for OCR. Complex XML job specifications can be submitted to control output. Support available for Arabic and Asian languages.

IRISDocument Server
IRIS Document Server

IRISDocument Server is a lower cost solution compared to recognition server, but lacks some of the more advanced features and has slightly lower accuracy.

Several versions are available with varying monthly page processing limitations, letting you scale your solution to meet your budget requirements. Asian, Arabic, and Hebrew language packs are also available.

CVision Maestro
CVisionMaestro

Innovative server-based OCR software CVISION Maestro Recognition Server has been engineered and designed for industrial strength, corporate volume scanning & OCR needs. Maestro provides a flexible OCR solution delivered from a centralized server which enables organizations to easily integrate into their existing document and imaging workflow, while providing multiple workflow accessibility allowing users to perform many image processing functions beyond OCR.

PaperVision Capture OCR

Designed for service bureaus and large scanning departments, PaperVision Capture OCR by Digitech brings an uprecedented level of efficiency and power to information capture. Work with everything, implement any custom process you want, and track any statistic you need. Provides index value population and document break insertion as an automated process.