What is OCR?

OCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Handprint recognition (ICR or Intelligent Character Recognition) for forms processing.
Advanced rules-based templates for locating common data elements on pages with different layouts and formatting.
Artificial intelligence that is able to use point and click user feedback to train recognition templates automatically.
Natural language processing is able to interpret paragraphs of text and extract meaningful data from them.
Robotic process automation puts back office integration into the hands of power users instead of programmers.
Preconfigured form templates and business rules for common applications like invoice processing and healthcare claim forms.

Smart ocr solutions for document processing needs

80%

Using the OCR software enables enterprises to reduce the document processing time by as much as 80%

Benefits of using OCR

Save paper.

If not for the trees then do it for the savings on paper, toner, copiers and their services contracts, etc.

Find files quickly.

How much time is wasted searching for paper files? Digital documents can searched and viewed instantly from anywhere.

Disaster recovery.

Paper is much harder to backup and restore than digital data.

Storage costs.

Office square footage and off-site records storage adds to the cost of keeping paper documents.

It's the Law.

Government mandates for records retention and digital document submission are common requirements. Digital signatures are everywhere.

Shared access.

Share documents on your local network, e-mail, intranet, SharePoint or in the cloud. Sharing paper documents requires a copier or a fax machine.

Guide to Everything OCR

What is OCR?

What is OCR?

The primary purpose of Optical Character Recognition is to quickly and automatically scanned or photographed document images into machine readable text that can be searched for keywords or edited in a word processor.

In general, an ocr engine analyzes the pixel data of scanned images and searches for patterns resembling letters, numbers, and other symbols to create a digitized record of characters.

The biggest OCR engines employ huge Artificial Intelligence (AI) and Machine Learning (ML) models that have been trained on billions of documents collected over decades of development.

While the exact mechanics of this process can be complicated, OCR engines are a key automation tool for the digital age. It bridges the gap between knowledge stored on physical documents and digital data that can be edited, searched or parsed into structured data to automate data entry tasks.

OCR Output Types

OCR Output Types

Full Page OCR converts the entire document into one of the following formats:

Plain Text – Only the text in the document is retained.
Formatted Text – Text information is retained in consecutive paragraphs while saving font size and style.
Exact Copy – All information on the page is retained, including graphics, and placed on the page in the manner that most closely recreates the original document.
Spreadsheet – Documents with tables can be converted automatically to Excel, CSV and other spreadsheet formats.
Searchable PDF File – Text information is retained on a hidden layer behind the scanned image, allowing the file’s contents to be searched while retaining the appearance of the original.
E-Book – Convert paper books to popular e-book formats for use in digital readers.

Limitations of OCR

Limitations of OCR

OCR software is also limited in what it is able to recognize. Most OCR software are only designed to recognize machine printed text, as opposed to handwriting. For handwriting there is ICR software (“Intelligent Character Recognition”). Desktop OCR applications include some limited ICR capabilities and can get acceptable accuracy with handprint. Cloud OCR solutions tend to get the best results for handwriting.

Similarly, most OCR software are only able to convert traditional machine fonts, not cursive scripts or calligraphy. There are many fonts out there, and OCR engines depend on common, separated letter shapes to recognize the text, so fonts that are unusual or flow together will not be recognized.

For more information, check these FAQs:

OCR Solutions for Business

OCR Solutions for Business

OCR can do a lot more than convert scanned documents to Word and PDF files. Businesses can use OCR to automate a wide variety of document workflows and data entry tasks.

Business OCR data capture solutions including OCR servers for high volume conversions, document scanning and archiving systems, forms processing software with handprint recognition to capture surveys and applications, invoice processing for accounts payable automation, and document management systems to create secure repositories for searching, security and regulatory compliance.

Robotic Process Automation is becoming one of the most popular applications of OCR by making it possible for IT and knowledge workers to integrate OCR data capture into business workflows without having to write code or interface with APIs.

Integration services are available from our expert staff, each of whom has at least 10 years experience with implementing OCR data capture solutions for businesses.

Types of OCR Software

Types of OCR Software

OCR Software for full-text conversion comes in many different types, which vary in price range based on their features, speed, and accuracy. OCR software for data capture is covered in another section.

For instance, you can get OCR freeware such as SimpleOCR or Tesseract that will serve in a pinch, but it will not provide acceptable accuracy if the document images are not pristine, and have other limitations like language support and the number of pages that can be processed at once.

One step up from freeware is Desktop OCR software. These are the best option if you need to convert several documents to Word or PDF and can spend $50-$100 to ensure that you get quality results with minimal need for corrections and reformatting.

If you have need to convert hundreds or thousands of documents, you can invest in a Batch OCR designed for scanning and converting large volumes of documents, or Server OCR software that watches “hot” folders for incoming documents in a variety of formats and languages and convert them to Word, PDF, eBook and other formats automatically.

For more information check out:

Why are the prices of OCR applications so different?

OCR Data Capture

OCR Data Capture

OCR can also be used to automate data entry from forms, surveys, invoices and other documents. Handwriting recognition (ICR) solutions are also available. For more information, check out these links:

Improving OCR Accuracy

Improving OCR Accuracy

Although some OCR engines are better than others, no software can guarantee 100% accuracy. This is because there are other factors in play, including scan quality. Recognition software will not be able to do its work if the scanner is not properly digitizing the page.

It is recommended to scan at a resolution of 300dpi for best results. Black & White (Bitonal) is preferred over Greyscale or Color modes, and although most modern scanners are fairly well configured out of the box, you may want to adjust your Brightness and Contrast settings for your particular documents.

If you do not have a scanner that has the necessary speed, quality, or other features that you require to scan your documents, you can always find a large selection of document scanners at ScanStore! ScanStore even has a handy scanners guide to help you find the perfect scanner for your specific requirements and price range.

For more on improving OCR accuracy check out these articles:

Choosing an OCR App

Choosing an OCR Solution

There are several OCR (Optical Character Recognition) software solutions available to convert scanned images to text, Word, Excel, HTML or searchable PDF. The differences between them can often be obscure, leaving many to wonder why some OCR software cost under $100 while others cost $500 or more.

IRIS Nuance Banner

The main features that differentiate OCR software are:

Character recognition accuracy
Page layout reconstruction accuracy
Support for languages
User interface design
Output file formats (Word, Excel, PDF, eBook, etc.)
OCR speed and support for multi-core CPUs
Batch processing modes
Advanced PDF encryption or compression
Special features for niche projects

Because of the infinite combinations of document types, OCR engines, project requirements and special features, it may be possible that one engine will perform better with your particular documents than another. Use our handy OCR featur e comparison chart to determine which OCR program best meets your requirements. And you can always ask an expert for a recommendation anytime!

Our Recommendations

What is the Best OCR Software?

Our OCR experts have tested the latest versions of FineReader, Kofax OmniPage, and ReadIRIS, and we consider ABBYY FineReader PDF the best overall value for business users, while ReadIRIS is the best OCR software for under $100.

The key deciding factors were:

User interface design
Page layout reconstruction capabilities
Extensive language support
Engine stability when processing large files
Availability and quality of technical support

Though other testing labs have ranked OmniPage‘s overall accuracy slightly higher, we find the difference is nearly negligible. All modern OCR software has very good accuracy, so we recommend going with the one that has particular special features like ReadIRIS Corporate‘s CardIRIS, FineReader’s camera OCR and screenshot reader, or OmniPage Ultimate’s form data collection, auto-redaction and barcode filing capabilities.

If you would like to try them out yourself, you can download trial versions of ReadIRIS and FineReader from our store. Kofax does not provide demos for its OCR products.

Businesses with many documents to process should use our SimpleIndex batch document scanning software with the FineReader ocr engine to scan and OCR large batches of documents. Barcode and OCR can also be used to sort and file documents into folders, databases, SharePoint, and other cloud storage providers.

Desktop & Freeware OCR

OCR Freeware Desktop OCR MAC OCR Software Receipt Scanning

Batch OCR & Servers

OCR to Excel OCR Servers SimpleOCR SDK SimpleIndex Batch OCR

Enterprise OCR Solutions

OCR Data Capture Forms Processing Invoice Processing Document Management

OCR Processing Steps & Workflow

OCR Processing Steps

All OCR projects regardless of size or complexity will follow these processing steps.

1. Scan and Upload.

1. Scan or Upload Documents

Fortunately, many documents these days already exist as PDF files or JPEG images and don’t need to be scanned. These files can be loaded into your OCR program directly.

If you’re starting with paper, then first you have to get the documents ready. That means pulling any staples and paperclips, taping down loose edges, post-its, small documents and anything else that might get stuck in the document feeder.

In batch scanning scenarios, you may need to insert barcode separator sheets to indicate the start of each new document and automate filing.

You take these very neatly stacked piles of paper and feed them into the scanner. The neater the stacks, the less you have to open the scanner up and pull out little bits of paper and staples, and generally makes for a more pleasant and swear-free work environment.

Most OCR applications and modern scanners have very good image quality that require minimal configuration for good results, especially for color images. Black and White document images can be significantly smaller in file size, but it can be challenging to get consistently high-quality scans on variable paper types. 300 dots per inch resolution is recommended.

Free and desktop OCR applications will typically use a “Save As” style dialog to scan and save files one at a time. This is OK for a few documents, but if you have hundreds or thousands you’ll want something more streamlined for batch scanning.

While capturing documents with digital camera is extremely convenient, it is not recommended for OCR unless you are using an application specifically designed to capture documents on your phone. While OCR technology has improved to compensate for the common distortions that cameras produce, it will never be as accurate as a scanned image.

2. Classify.

2. Preprocessing and Classification

The first thing the OCR application does is to perform fast analysis on the image to identify document layout elements like pictures, paragraphs, tables, or barcodes.

For simple workflows there may only be a single document type. More complex batch document scanning or data capture applications that handle multiple types of documents must first identify the type of document based on the simple layout analysis so that specific OCR regions, languages, and other processing parameters can be correctly applied.

AI-based document classification incorporates user feedback to train and improve the layout analysis and more accurately identify document types as the system is used.

3. Analyze.

3. Perform the OCR Document Analysis

This is the step that reads the text on the document and performs the Optical Character Recognition.

This is an automated step that can take some time depending on what type of document is being read. Documents with handwriting, lots of small print text, tables, or background noise can significantly impact the processing speed.

For simple text conversion this can take less than a second per page, while more complex data extraction can take several seconds or longer.

4.Verify.

4.Verify the OCR Results

Human verification of the text is required whenever errors are not acceptable. When converting large document archives to text primarily used for search, manual verification may not be required. For data capture applications where mistakes can lead to costly transaction errors, verification is essential.

OCR tools provide streamlined user interfaces that highlight words with possible errors, display the image and recognized text together, and let you quickly make corrections.

Verification is also how AI-based systems train their models to improve classification, field extraction, and text recognition accuracy.

In the past, OCR applications relied on user training of live samples to improve recognition rates. Modern OCR leverages huge datasets with billions of samples, so interactive training of the recognition patterns is no longer effective. When training is employed by modern data capture systems, it is focused on document type classification and field location training rather than the recognition of fonts and characters.

5. Export.

5. Export the Results to Your Apps

Now that you have your original document image and your converted OCR text, the final step is to put that data to use in whatever application it was that made you go down road in the first place.

OCR applications can export the documents and data to a variety of formats, the most common being:

Searchable PDF files with OCR text hidden behind the image
Word processing documents like DOCX or Google Docs
Excel spreadsheets for documents formatted as tables
Ebook reader formats like ePub and Mobi
Structured text data files like CSV, XML, JSON, etc.
SQL database servers
Upload to cloud services via connector apps or APIs

Desktop OCR Software
Enterprise OCR Software

Desktop OCR Software

	SimpleOCR Freeware	ABBYY FineReader PDF 15	ABBYY FineReader PDF Corporate 15	IRIS ReadIRIS PDF Standard 23	IRIS ReadIRIS PDF Business 23	Tungsten OmniPage Ultimate	SimpleIndex Desktop 11
Scanner Drivers Supported	TWAIN	TWAIN	TWAIN	TWAIN	TWAIN	TWAIN / ISIS	TWAIN / ISIS
Table/Spreadsheet Recognition		✓	✓	✓	✓	✓
PDF Password Support		✓	✓		✓	✓
Searchable PDF Output		✓	✓	✓	✓	✓
Highly Compressed PDF Output		MRC	MRC	iHQC	iHQC	MRC
Vertical Text Recognition		✓	✓			✓
Barcode Recognition		✓	✓	✓	✓	✓	✓
Image Pre-processing		✓	✓	✓	✓	✓	✓
Watched / Hot Folder			✓		✓	✓	✓
Batch processing			✓	✓	✓	✓	✓
Managed server processing
Indexing					✓		✓
Business Card Recognition					Box version only
Screenshot reader		✓	✓
Zone Templates		✓	✓	✓	✓	✓	✓
Proofing & Training		✓	✓	✓			✓
Languages Supported	3	193	193	138	138	137	179
Arabic/Farsi/Hebrew	Hebrew & Arabic Version	Hebrew & Arabic Version	✓	✓			✓
Chinese/Japanese/Korean		✓	✓	✓	✓	✓	✓
User Dictionaries		✓	✓			✓	✓
Page Limit per Document	1	none	none	50	none	none	none
Multi-Core Support		✓	✓	✓	✓	✓
Installation	Desktop	Desktop	Desktop/Server	Desktop	Desktop	Desktop	Desktop
License	Freeware	Standalone	Per Seat / Concurrent	Standalone	Standalone	Standalone 3 Licenses	Standalone
Product Information	More Info	More Info	More Info	More Info	More Info	More Info	More Info

Enterprise OCR Software

	ABBYY FineReader Server	IRIS Powerscan Server	PaperVision Capture OCR Server	SimpleIndex 11
Scanner Drivers Supported	TWAIN / ISIS	N/A	TWAIN / ISIS	TWAIN / ISIS
Table/Spreadsheet Recognition	✓	✓		✓
PDF Password Support	✓	✓		✓
Searchable PDF Output	✓	✓	✓	✓
Highly Compressed PDF Output	MRC	iHQC
Vertical Text Recognition	✓		✓
Barcode Recognition	✓	✓	✓	✓
Image Pre-processing	✓	✓	✓	✓
Watched / Hot Folder	✓	✓		✓
Batch processing	✓	✓	✓	✓
Managed server processing	✓	✓	✓	✓
Indexing	✓	✓	✓	✓
Business Card Recognition
Screenshot reader
Zone Templates			✓	✓
Proofing & Training	✓			✓
Languages Supported	191	137	121	179
Arabic/Farsi/Hebrew	Hebrew & Arabic Version	2 Add-ons		✓
Chinese/Japanese/Korean	✓	Asian Add-on		✓
User Dictionaries	✓	✓		✓
Page Limit per Document	none	none	none	none
Multi-Core Support	✓	Multi-CPU Add-on	✓	✓
Installation	Server	Desktop / Server	Desktop / Server	Desktop / Server
License	Volume-based	Core-based	Named / Concurrent	Standalone
Product Information	More Info	More Info	More Info	More Info

SimpleIndex

Simple Software’s SimpleIndex has everything you need for document scanning, zone OCR, data validation and output to searchable PDF files, CSV or XML data, document management systems or cloud storage like SharePoint, AWS, Box and Google Drive.

The SimpleIndex document management suite includes:

SimpleIndex comes in Standard, Barcode, OCR and Professional versions, with Server licensing available for unattended processing. FineReader OCR is included with OCR and Pro. Tesseract OCR is included with all versions. Upgraded bar code recognition and scanner drivers are included with Barcode and Pro.
SimpleView offers folder and file based document management, OCR and editing.
SimpleSearch uses database indexes for fast, precise document searches.
SimpleSend automates sending of document files via secure FTP or email.
SimpleExport converts CSV files into XML or any other text file format using XSLT.
SimpleCoversheet creates bar code separator sheets to automate scanning and indexing.

SimpleIndex

SimpleOCR

Our own freeware OCR application provides acceptable accuracy for those who just need to convert a few pages and can’t justify the cost of commercial OCR software. Developers can use the command-line and SDK versions to integrate SimpleOCR with their custom applications.

SimpleOCR

SimpleView

SimpleView lets you quickly scan, organize, search and view documents stored on your hard drive or file servers. Most document management systems use a database to organize and search for files. This forces you to laboriously import files into the system, then you must rely on that system anytime you access your files. SimpleView lets you use your existing folder and filing system to find, view and annotate documents.

SimpleView

Abbyy FineReader 15

Abbyy FineReader 15 is a highly accurate and easy to use OCR software that includes host of features including digital camera OCR, intelligent document layouts, image enhancement, barcode recognition, and command line integration. FineReader is our pick for OCR software because its document layout retention will save you much time in reformatting documents you convert for editing.

Abbyy FineReader 15

ABBYY FineReader 15 Corporate

FineReader Corporate Edition offers unique concurrent licensing that makes it possible for many users who need occasional use of OCR to share a small pool of active licenses. With accuracy comparable to OmniPage, superior technical support services, and a user interface that many users find preferable, we think that FineReader Corporate is the best choice of OCR software for business.

ABBYY FineReader 15 Corporate

ABBYY FineReader Pro for Mac

Creates editable, searchable files and e-books from scans, PDFs and digital photographs. The most accurate OCR available for OSX, its unmatched recognition and conversion eliminates retyping and reformatting. Sophisticated yet remarkably intuitive, FineReader has an easy-to-use interface that makes even the most complex tasks simple.

ABBYY FineReader Professional for Mac

ABBYY FineReader Server

Innovative server-based OCR software for performing centralized enterprise-wide OCR processing. Processor license allows anyone on the network to submit files for OCR. Complex XML job specifications can be submitted to control output. Support available for Arabic and Asian languages.

ABBYY FineReader Server

ABBYY FlexiCapture

ABBYY FlexiCapture is a powerful data capture and forms processing solution from a world-leading technology vendor. It transforms streams of documents of any structure and complexity into business-ready data. And its award-winning recognition technologies, automatic document classification, plus a highly scalable and customizable architecture, mean that it can help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs. We would recommend it as the best choice of OCR software for enterprise scale business.

ABBYY FlexiCapture

ABBYY Vantage

ABBYY Vantage applies the RPA model to data capture software.

Vantage has a marketplace of reusable document “skills” that you can drag-and-drop into OCR projects and RPA workflows to capture data from documents with minimal configuration and specialized knowledge. Select from a huge library of pre-configured templates, or easily train new documents with machine learning.

ABBYY Vantage

IRIS ReadIRIS PDF 23 Standard

Readiris PDF 23 Standard is a powerful PDF centric Manager Software allowing from multiple different format’ files input to compose, edit, annotate, split, sort, amend, compress, e-sign and share your own build secured single or multiple PDF files.

IRIS ReadIRIS PDF 22 Standard

IRIS ReadIRIS PDF 23 Business

Readiris PDF 23 Business is a powerful PDF centric Manager Software allowing from multiple different format’ files input to compose, edit, annotate, split, sort, amend, compress, e-sign and share your own build secured single or multiple PDF files. Similar to Standard version but offers additional features like scanning documents from any scanners, deep PDF features like eSignatures and multi format file saving.

IRIS ReadIRIS PDF 23 Business

IRIS Powerscan Server

IRISPowerscan OCR Server & Central Management distributes document processing activities among multiple users and share a common organization scheme for export digitized documents. It has more powerful zone OCR and automated indexing capabilities compared to other OCR servers, and is priced based on processing speed rather than pages, with unlimited licenses available.

IRIS Powerscan Server

IRIS ReadIRIS PDF Standard for Mac

Readiris PDF for Mac is a powerful PDF centric Manager Software allowing from multiple different format’ files input to compose, edit, annotate, split, sort, amend, compress, e-sign and share your own build secured single or multiple PDF files.

IRIS ReadIRIS PDF Standard for Mac

IRIS ReadIRIS PDF Business for Mac

Readiris™ PDF Business for Mac is a PDF centric manager application offering a complete set of OCR, scans and document composing ! All-in-One place. Readiris™ PDF centralizes all document manipulations in one single platform to efficiently manage PDF’s, images and scans. That’s the All in one PDF creator & converter for paperless office !
Readiris Business goes even beyond your needs by including key differentiator novelties such as a unique automatic document separation based on Page count, Barcode, White Page or Zonal OCR. In addition, it saves time thanks to its unique automatic document naming based on Smart zone, Barcode or Zonal Text OCR.

IRIS ReadIRIS PDF Business for Mac

Tungsten Kofax Power PDF Advanced

Tungsten Kofax Power PDF Advanced makes it easy to gain control over PDF files and workflows with the ability to create, convert, edit, assemble, sign and securely share PDF files anywhere. Power PDF is a solution that delivers performance, ease, compatibility and value more than ever before, freeing you from the compromises of traditional PDF applications.

Tungsten Kofax Power PDF

Kofax OmniPage Ultimate

OmniPage Ultimate has several unique features that make it stand out for a variety of applications. Some of these include auto-redaction, SharePoint integration, automatic filing with barcodes, PDF auto-bookmarking, form data collection and MFP support. Most of these new features are not available in the Standard edition.

Nuance OmniPage Ultimate

Tungsten OmniPage Server

Tungsten Kofax OmniPage Server is a robust and versatile OCR solution for server-based, large volume document conversion needs. It is a reliable high‑volume, server‑based PDF and image converter that will be useful for a large variety of your automation needs.

Tungsten OmniPage Server

PaperVision Capture OCR Server

PaperVision Capture’s fully customizable OCR server uses a machine-based Open Text OCR license to give you incredibly fast full-text OCR capable of handling millions of pages per day without expensive click charges. It can be expanded to add powerful zone OCR and forms processing capabilities. PaperVision Capture was designed for the biggest service bureau scanning operations in the world and tackle any scanning, OCR and data capture job. Its modular licensing based on the number of capture stations gives it the best price/performance ratio for many scenarios.

PaperVision Capture OCR Server

OCR Guide

What is OCR?

80%

Using the OCR software enables enterprises to reduce the document processing time by as much as 80%

Benefits of using OCR

Save paper.

If not for the trees then do it for the savings on paper, toner, copiers and their services contracts, etc.

Find files quickly.

How much time is wasted searching for paper files? Digital documents can searched and viewed instantly from anywhere.

Disaster recovery.

Paper is much harder to backup and restore than digital data.

Storage costs.

Office square footage and off-site records storage adds to the cost of keeping paper documents.

It's the Law.

Government mandates for records retention and digital document submission are common requirements. Digital signatures are everywhere.

Shared access.

Share documents on your local network, e-mail, intranet, SharePoint or in the cloud. Sharing paper documents requires a copier or a fax machine.

Guide to Everything OCR

What is OCR?

OCR Output Types

Limitations of OCR

OCR Solutions for Business

Types of OCR Software

OCR Data Capture

Improving OCR Accuracy

Choosing an OCR App

Our Recommendations

What is OCR?

What is OCR?

OCR Output Types

OCR Output Types

Limitations of OCR

Limitations of OCR

OCR Solutions for Business

OCR Solutions for Business

Types of OCR Software

Types of OCR Software

OCR Data Capture

Improving OCR Accuracy

Improving OCR Accuracy

Choosing an OCR App

Choosing an OCR Solution

Our Recommendations

What is the Best OCR Software?

Desktop & Freeware OCR

Batch OCR & Servers

Enterprise OCR Solutions

OCR Processing Steps & Workflow

OCR Processing Steps

1. Scan and Upload.

2. Classify.

3. Analyze.

4.Verify.

5. Export.

OCR Processing Steps

All OCR projects regardless of size or complexity will follow these processing steps.

1. Scan and Upload.

1. Scan or Upload Documents

2. Classify.

2. Preprocessing and Classification

3. Analyze.

3. Perform the OCR Document Analysis

4.Verify.

4.Verify the OCR Results

5. Export.

5. Export the Results to Your Apps

Desktop OCR Software

Enterprise OCR Software

Desktop OCR Software

Scanner Drivers Supported

Table/Spreadsheet Recognition

PDF Password Support

Searchable PDF Output

Highly Compressed PDF Output

Vertical Text Recognition

Barcode Recognition

Image Pre-processing

Watched / Hot Folder

Batch processing

Managed server processing