Optical Character Recognition

During your foray into the world of document scanning, you’ve likely encountered the term “OCR” and may even know that it stands for “Optical Character Recognition“. But what exactly is OCR and how can you make the best use of this sophisticated and valuable tool?

We’re here to give you a run-down of what you need to know about Optical Character Recognition, answer any questions you might have, and recommend the solution for your scanning project.

Table of Contents:

What is OCR?

What Is OCR Barcode Scanning Recognition SoftwareThe primary purpose of Optical Character Recognition  is to quickly and automatically scanned or photographed document images into machine readable text that can be searched for keywords or edited in a word processor.

In general, an OCR engine analyzes the pixel data of scanned images and searches for patterns resembling letters, numbers, and other symbols to create a digitized record of characters.

The biggest OCR engines employ huge Artificial Intelligence (AI) and Machine Learning (ML) models that have been trained on billions of documents collected over decades of development.

While the exact mechanics of this process can be complicated, OCR engines are a key automation tool for the digital age. It bridges the gap between knowledge stored on physical documents and digital data that can be edited, searched or parsed into structured data to automate data entry tasks.

OCR Output Types

Search Document OCR Recognized TextFull Page OCR converts the entire document into one of the following formats:

    • Plain Text – Only the text in the document is retained.
    • Formatted Text – Text information is retained in consecutive paragraphs while saving font size and style.
    • Exact Copy – All information on the page is retained, including graphics, and placed on the page in the manner that most closely recreates the original document.
    • Spreadsheet – Documents with tables can be converted automatically to Excel, CSV and other spreadsheet formats.
  • Searchable File – Text information is retained on a hidden layer behind the scanned image, allowing the file’s contents to be searched while retaining the appearance of the original.
  • E-Book – Convert paper books to popular e-book formats for use in digital readers.

Limitations of OCR

OCR software is also limited in what it is able to recognize. Most OCR software are only designed to recognize machine printed text, as opposed to handwriting. While there are ICR software that can recognize handwritten information, they tend to be enterprise level solutions for forms processing work, rather than full page recognition.

Similarly, most OCR software are only able to convert traditional machine fonts, not cursive scripts or calligraphy. There are many fonts out there, and OCR engines depend on common, separated letter shapes to recognize the text, so fonts that are unusual or flow together will not be recognized.

OCR Solutions for Business

OCR can do a lot more than convert scanned documents to Word and files. Businesses can use OCR to automate a wide variety of document workflows and data entry tasks.

Business OCR data capture solutions including OCR servers for high volume conversions, document scanning and archiving systems, forms processing software with handprint recognition to capture surveys and applications, invoice processing for accounts payable automation, and document management systems to create secure repositories for searching, security and regulatory compliance.

Robotic Process Automation is becoming one of the most popular applications of OCR by making it possible for IT and knowledge workers to integrate OCR into business workflows without having to write code or interface with APIs.

Integration services are available from our expert staff, each of whom has at least 10 years experience with implementing OCR data capture solutions for businesses.

Levels of OCR Software

OCR Software for full-text conversion comes in many different types, which vary in price range based on their features, speed, and accuracy.

For instance, you can get OCR freeware such as SimpleOCR or Tesseract that will serve in a pinch, but it will not provide acceptable accuracy if the document images are not pristine, and have other limitations like language support and the number of pages that can be processed at once.

One step up from freeware is Desktop OCR software. These are the best option if you need to convert several documents to Word or and can spend $50-$200 to ensure that you get quality results with minimal need for corrections and reformatting.

If you have need to convert hundreds or thousands of documents, you can invest in a Batch OCR designed for scanning and converting large volumes of documents, or Server OCR software that watches “hot” folders for incoming documents in a variety of formats and languages and convert them to Word, PDF, eBook and other formats automatically.

For more information check out:

Improving OCR Accuracy

Although some OCR engines are better than others, no software can guarantee 100% accuracy. This is because there are other factors in play, including scan quality. Recognition software will not be able to do its work if the scanner is not properly digitizing the page.

It is recommended to scan at a resolution of 300dpi for best results. Black & White (Bitonal) is preferred over Greyscale or Color modes, and although most modern scanners are fairly well configured out of the box, you may want to adjust your Brightness and Contrast settings for your particular documents.

If you do not have a scanner that has the necessary speed, quality, or other features that you require to scan your documents, you can always find a large selection of document scanners at ScanStore! ScanStore even has a handy scanners guide to help you find the perfect scanner for your specific requirements and price range.

For more on improving OCR accuracy check out these articles:

SimpleOCR is brought to you by:

Document ScannersScanner Parts
Celebrating 20 Years

OCR Software Guide

There are several OCR (Optical Character Recognition) software solutions available to convert scanned images to text, Word, Excel, HTML or searchable PDF. The differences between them can often be obscure, leaving many to wonder why some OCR software cost under $100 while others cost $500 or more.

IRISNuance Banner

The main features that differentiate OCR software are:

  • Character recognition accuracy
  • Page layout reconstruction accuracy
  • Support for languages
  • User interface design
  • Output file formats (Word, Excel, PDF, eBook, etc.)
  • OCR speed and support for multi-core CPUs
  • Batch processing modes
  • Advanced encryption or compression
  • Special features for niche projects

Because of the infinite combinations of document types, OCR engines, project requirements and special features, it may be possible that one engine will perform better with your particular documents than another. Use our handy OCR feature comparison chart to determine which OCR program best meets your requirements. And you can always ask an expert for a recommendation anytime!

And The Winner Is…

Abbyy FineReader 15

Our OCR experts have tested the latest versions of FineReader, Kofax OmniPage, and ReadIRIS, and we consider ABBYY FineReader the best overall value for business users, while ReadIRIS is the for under $100.

The key deciding factors were:

  • User interface design
  • Page layout reconstruction capabilities
  • Extensive language support
  • Engine stability when processing large files
  • Availability and quality of technical support

Though other testing labs have ranked OmniPage‘s overall accuracy slightly higher, we find the difference is nearly negligible. All modern OCR software has very good accuracy, so we recommend going with the one that has particular special features like ReadIRIS Corporate‘s CardIRIS, FineReader’s camera OCR and screenshot reader, or OmniPage Ultimate’s form data collection, auto-redaction and barcode filing capabilities.

If you would like to try them out yourself, ScanStore offers free demo downloads for ReadIRIS and FineReader. Kofax does not provide demos for its OCR products.

Businesses with many documents to process should use our SimpleIndex batch document scanning software with the FineReader OCR engine to scan and OCR large batches of documents. Barcode and OCR can also be used to sort and file documents into folders, databases or SharePoint.

OCR Data Capture

OCR can also be used to automate data entry from forms, surveys, invoices and other documents. Handwriting recognition (ICR) solutions are also available. For more information, check out these links:

The SimpleOCR freeware does NOT have any handprint OCR capabilities. it will not be able to recognize handwritten text. ICR (Intelligent Character Recognition) software is needed to read handwriting. There is a demo of SoftWriting included in the download but this is not part of the freeware and no longer supported by the vendor.

SimpleOCR Freeware

ABBYY FineReader 15

ABBYY FineReader Corporate 15

IRIS ReadIRIS Pro 17

IRIS ReadIRIS Corporate 17

Kofax OmniPage Ultimate

SimpleIndex Desktop 9

Scanner Drivers Supported

TWAINTWAINTWAINTWAINTWAINTWAIN / ISISTWAIN / ISIS

Table/Spreadsheet Recognition

Password Support

Searchable PDF Output

Highly Compressed PDF Output

MRCMRCiHQCiHQCMRC

Vertical Text Recognition

Barcode Recognition

Image Pre-processing

Watched / Hot Folder

Batch processing

Managed server processing

Indexing

Business Card Recognition

Box version only

Screenshot reader

Zone Templates

Proofing & Training

Languages Supported

3193193128137137179

Arabic/Farsi/Hebrew

Hebrew & Arabic VersionHebrew & Arabic Version

Chinese/Japanese/Korean

User Dictionaries

Page Limit per Document

1nonenone50nonenonenone

Multi-Core Support

Installation

DesktopDesktopDesktop/ServerDesktopDesktopDesktopDesktop

License

FreewareStandalonePer Seat / ConcurrentStandaloneStandaloneStandalone
3 Licenses
Standalone

Product Information

More InfoMore InfoMore InfoMore InfoMore InfoMore InfoMore Info

ABBYY FineReader Server

IRIS Powerscan Server

PaperVision Capture OCR Server

SimpleIndex 9

Scanner Drivers Supported

TWAIN /
ISIS
N/ATWAIN /
ISIS
TWAIN /
ISIS

Table/Spreadsheet Recognition

PDF Password Support

Searchable PDF Output

Highly Compressed PDF Output

MRCiHQC

Vertical Text Recognition

Barcode Recognition

Image Pre-processing

Watched / Hot Folder

Batch processing

Managed server processing

Indexing

Business Card Recognition

Screenshot reader

Zone Templates

Proofing & Training

Languages Supported

191137121179

Arabic/Farsi/Hebrew

Hebrew & Arabic Version2 Add-ons

Chinese/Japanese/Korean

Asian Add-on

User Dictionaries

Page Limit per Document

nonenonenonenone

Multi-Core Support

Multi-CPU Add-on

Installation

ServerDesktop /
Server
Desktop /
Server
Desktop /
Server

License

Volume-basedCore-basedNamed /
Concurrent
Standalone

Product Information

More InfoMore InfoMore InfoMore Info

Share This Story, Choose Your Platform!

OCR Guide
Brands
Compare most popular choices in one cozy and extensive table
Languages
Applications