Answer these simple questions and let us to help you find what you need.
Optical Character Recognition
During your foray into the world of document scanning, you’ve likely encountered the term “OCR” and may even know that it stands for “Optical Character Recognition“. But what exactly is OCR and how can you make the best use of this sophisticated and valuable tool?
We’re here to give you a run-down of what you need to know about Optical Character Recognition, answer any questions you might have, and recommend the best OCR software solution for your scanning project. Let’s begin!
What is OCR?
The primary purpose of Optical Character Recognition is to quickly and automatically recognize and convert images of machine-printed or typed text into actual electronic data that users can organize, search, and modify. In general, an OCR engine analyzes the pixel data of scanned images and searches for patterns resembling letters, numbers, and other symbols to create a digitized record of characters. While the exact mechanics of this process can be complicated, OCR engines ultimately enable users to easily and effectively perform a wide array of functions such as information entry, processing, categorization, retrieval, and analysis.
Applications of OCR
Optical Character Recognition employs robust technology to digitally convert, recognize, and manage scanned paper and machine-readable documents promptly and accurately. Such reliable OCR capabilities power vital systems, facilitate essential services, improve routine operations, and promote overall efficiency. Two significant methods of such Optical Character Recognition are:
Full Page OCR – Converts the entire page into one of the following formats:
-
Plain Text – Basic text information on the page is retained in a consecutive order.
-
Formatted Text – Text information is retained in consecutive paragraphs while saving font size and style. This can also preserve tables in a tabular format, such as spreadsheets.
-
Exact Copy – All information on the page is retained, including graphics, and placed on the page in the manner that most closely recreates the original document.
-
Searchable File – Text information is retained on a hidden layer behind the scanned image, allowing the file’s contents to be searched while retaining the appearance of the original.
Zone OCR – Recognizes document structure and identifies fields of text located on defined fields of the page. This zonal method is often applied for the purpose of indexing and document management. Detailed information can be distinguished and utilized to perform numerous functions, such as saving specific metadata to particular locations, archiving strings of text into organized formats like databases, automating the population of information and processes, and more.
Levels of OCR Software
OCR Software comes in many different types, which vary in price range based on their features, speed, and accuracy. For instance, you can get a freeware such as SimpleOCR that will serve in a pinch, but it will only be able to convert BMP, JPG, and TIF images of English or French text into plain text documents of TXT or DOC format, one page at a time.
On the other hand, you can invest a few hundred dollars in a Batch OCR or even Server OCR software that will be able to watch particular folders for incoming documents in a variety of image formats and languages, then automatically recreate exact copies of all of the pages therein in a format of choice.
You can also find Desktop OCR software, which will bridge the price gap and include many of the features of the Corporate editions but still require some user input during conversion.
Improving Accuracy
Although some OCR engines are better than others, no software can guarantee 100% accuracy. This is because there are other factors in play, including scan quality. Recognition software will not be able to do its work if the scanner is not properly digitizing the page.
It is recommended to scan at a resolution of 300dpi for best results. Black & White (Bitonal) is preferred over Greyscale or Color modes, and although most modern scanners are fairly well configured out of the box, you may want to adjust your Brightness and Contrast settings for your particular documents.
If you do not have a scanner that has the necessary speed, quality, or other features that you require to scan your documents, you can always find a large selection of scanners at ScanStore!
ScanStore even has a handy scanners guide to help you find the perfect scanner for your specific requirements and price range.
Limitations of OCR
OCR software is also limited in what it is able to recognize. Most OCR software are only designed to recognize machine printed text, as opposed to handwriting. While there are ICR software that can recognize handwritten information, they tend to be enterprise level solutions for forms processing work, rather than full page recognition.
Similarly, most OCR software are only able to convert traditional machine fonts, not cursive scripts or calligraphy. There are many fonts out there, and OCR engines depend on common, separated letter shapes to recognize the text, so fonts that are unusual or flow together will not be recognized.
OCR Software Guide
There are several OCR (Optical Character Recognition) software solutions available to convert scanned images to text, Word, Excel, HTML or searchable PDF. The differences between them can often be obscure, leaving many to wonder why some OCR software cost about $100 while others cost $500 or more.
The main features that differentiate OCR software are:
-
Character recognition accuracy
-
Page layout reconstruction accuracy
-
Support for languages
-
User interface design
-
Output file formats (Word, Excel, PDF, eBook, etc.)
-
OCR speed and support for multi-core CPUs
-
Batch processing modes
-
Advanced PDF encryption or compression
-
Special features for niche projects
Because of the infinite combinations of document types, OCR engines, project requirements and special features, it may be possible that one engine will perform better with your particular documents than another. Use our handy OCR feature comparison chart to determine which OCR program best meets your requirements. ScanStore provides demo downloads for most OCR software with your ScanStore User Account if you prefer to try before you buy.
OCR Software Categories
The OCR software guide is divided into the following categories:
Applications | OCR Servers | MAC OCR | PDF Converters | Personal | Hebrew/Arabic/Farsi OCR | Chinese/Japanese/Korean/Thai OCR
And The Winner Is…
The OCR experts at ScanStore have tested the latest versions of FineReader, Kofax OmniPage, ReadIRIS, CVision PDF Compressor, and SimpleOCR and we consider ABBYY FineReader the best overall value for business users, while ReadIRIS is the best OCR software for under $150.
The key deciding factors were:
-
User interface design
-
Page layout reconstruction capabilities
-
Extensive language support
-
Engine stability when processing large files
-
Availability and quality of technical support
Though other testing labs have ranked OmniPage‘s overall accuracy slightly higher, we find the difference is nearly negligible. All modern OCR software has very good accuracy, so we recommend going with the one that has particular special features like ReadIRIS Corporate‘s CardIRIS, FineReader‘s camera OCR and screenshot reader, or OmniPage Ultimate’s form data collection, auto-redaction and barcode filing capabilities.
If you would like to try them out yourself, ScanStore offers free demo downloads for ReadIRIS and FineReader. Kofax does not provide demos for its OCR products.
Businesses with many documents to process should use our SimpleIndex batch document scanning software with the FineReader OCR engine to scan and OCR large batches of documents. Barcode and OCR can also be used to sort and file documents into folders, databases or SharePoint.
OCR can also be used to automate data entry from forms, surveys, invoices and other documents. Handwriting recognition (ICR) solutions are also available. For more information, check out these links:
-
SimpleIndex batch scanning, zone OCR & barcode recognition software
-
Forms processing & data capture software guide
-
Invoice OCR software guide
-
Compare batch scanning & forms processing solutions
Attention! SimpleOCR does NOT have any handprint OCR capabilities, it will not be able to recognize handwritten text. ICR (Intelligent Character Recognition) is rather complicated software usually on a more expensive side.
SimpleOCR Freeware |
ABBYY FineReader 15 |
ABBYY FineReader Corporate 15 |
IRIS ReadIRIS Pro 17 |
IRIS ReadIRIS Corporate 17 |
Kofax OmniPage Ultimate |
SimpleIndex Desktop 9 |
|
---|---|---|---|---|---|---|---|
Scanner Drivers Supported |
TWAIN | TWAIN | TWAIN | TWAIN | TWAIN | TWAIN / ISIS | TWAIN / ISIS |
Table/Spreadsheet Recognition |
✓ | ✓ | ✓ | ✓ | ✓ | ||
PDF Password Support |
✓ | ✓ | ✓ | ✓ | |||
Searchable PDF Output |
✓ | ✓ | ✓ | ✓ | ✓ | ||
Highly Compressed PDF Output |
MRC | MRC | iHQC | iHQC | MRC | ||
Vertical Text Recognition |
✓ | ✓ | ✓ | ||||
Barcode Recognition |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Image Pre-processing |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Watched / Hot Folder |
✓ | ✓ | ✓ | ✓ | |||
Batch processing |
✓ | ✓ | ✓ | ✓ | ✓ | ||
Managed server processing |
|||||||
Indexing |
✓ | ✓ | |||||
Business Card Recognition |
Box version only | ||||||
Screenshot reader |
✓ | ✓ | |||||
Zone Templates |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Proofing & Training |
✓ | ✓ | ✓ | ✓ | |||
Languages Supported |
3 | 193 | 193 | 128 | 137 | 137 | 179 |
Arabic/Farsi/Hebrew |
Hebrew & Arabic Version | Hebrew & Arabic Version | ✓ | ✓ | |||
Chinese/Japanese/Korean |
✓ | ✓ | ✓ | ✓ | ✓ | ||
User Dictionaries |
✓ | ✓ | ✓ | ✓ | |||
Page Limit per Document |
1 | none | none | 50 | none | none | none |
Multi-Core Support |
✓ | ✓ | ✓ | ✓ | ✓ | ||
Installation |
Desktop | Desktop | Desktop/Server | Desktop | Desktop | Desktop | Desktop |
License |
Freeware | Standalone | Per Seat / Concurrent | Standalone | Standalone | Standalone 3 Licenses |
Standalone |
Product Information |
More Info | More Info | More Info | More Info | More Info | More Info | More Info |
ABBYY FineReader Server |
IRISDocument Server |
PaperVision Capture OCR |
SimpleIndex 9 |
|
---|---|---|---|---|
Scanner Drivers Supported |
TWAIN / ISIS |
N/A | TWAIN / ISIS |
TWAIN / ISIS |
Table/Spreadsheet Recognition |
✓ | ✓ | ✓ | |
PDF Password Support |
✓ | ✓ | ✓ | |
Searchable PDF Output |
✓ | ✓ | ✓ | ✓ |
Highly Compressed PDF Output |
MRC | iHQC | ||
Vertical Text Recognition |
✓ | ✓ | ||
Barcode Recognition |
✓ | ✓ | ✓ | ✓ |
Image Pre-processing |
✓ | ✓ | ✓ | ✓ |
Watched / Hot Folder |
✓ | ✓ | ✓ | |
Batch processing |
✓ | ✓ | ✓ | ✓ |
Managed server processing |
✓ | ✓ | ✓ | ✓ |
Indexing |
✓ | ✓ | ✓ | ✓ |
Business Card Recognition |
||||
Screenshot reader |
||||
Zone Templates |
✓ | ✓ | ||
Proofing & Training |
✓ | ✓ | ||
Languages Supported |
191 | 137 | 121 | 179 |
Arabic/Farsi/Hebrew |
Hebrew & Arabic Version | 2 Add-ons | ||
Chinese/Japanese/Korean |
✓ | Asian Add-on | ||
User Dictionaries |
✓ | ✓ | ✓ | |
Page Limit per Document |
none | none | none | none |
Multi-Core Support |
✓ | Multi-CPU Add-on | ✓ | |
Installation |
Server | Desktop / Server |
Desktop / Server |
Desktop / Server |
License |
Volume-based | Core-based | Named / Concurrent |
Standalone |
Product Information |
More Info | More Info | More Info | More Info |