Can OCR be trained for specific fonts?

OCR training was once a critical part of the conversion process. After a document was read, the operator would review the results to correct mistaken characters and these corrections would be used to train the engine so the next time you read a similar document the results are improved.

Modern OCR applications no longer rely on user training for accuracy unless you have very non-standard fonts. These engines have had decades of development and billions of samples used to train their algorithms. In most cases, the introduction of user training will only diminish the results for any documents that are different than the ones being trained.

The training functions still exist for these edge cases, but they are no longer an integral part of the OCR process.

Training in modern OCR is more likely to refer to enterprise data capture applications that use AI-based learning algorithms to find the locations of data points on documents with various different formats, such as invoices.

Does ReadIRIS, FineReader or OmniPage support Zone OCR?

The “Pro” versions of most Desktop OCR applications support the creation of zone templates that can be used to OCR specific regions on batches of documents.

Most OCR applications have “Lite” versions that don’t have the ability to manually create zones so it’s important to get the correct version.

With these applications it is often not possible to output this data as “fields” in a structured data file like CSV, Excel or XML. What you typically get a text file for each document with a line of text for each zone. The zones are designed more for excluding regions you don’t want or manually overriding the detection of text, tables and images in the document.

If you need to capture specific data in multiple documents and output them to structured data files or a SQL database, Batch OCR Applications are the best option for this.

If you need to capture data formatted in tables and output to CSV or Excel, desktop OCR applications do this quite well as long as the tables have a regular format with well-defined columns.

To capture handprint, irregular tables, large numbers of data points, or data that doesn’t always appear in the same place on every page, Forms Processing software is what you need.

OCR on a Multipage TIFF Image File (C++)

This function process several images stored in a TIFF file. The OCR results are stored in a text file.

void myOutputHandler(int infotype,int param); 
FILE *file; // file handler for result file 

int ProcessFile(const char *tifffile, const char *resultfile) 
{
SETOFIMG *set; 
IMG *img; 
int i; 
// load TIFF file 
set=LoadMultipleImg(tifffile); 
// check for error 
if (!set) 
return -1; 
// Open result file 
file=fopen(resultfile,"w"); 
// Initialize OCR engine 
SetLanguage(ENGLISH,"."); 
SetOutputMode(OM_TEXT); 
OCRSetOutputHandler(myOutputHandler); 
for(i=0;i<GetNbImages(set);i++) 
{
// get an image 
img=GetImage(set,i); 
// process it 
CR(img,0);
} 
// free the set 
FreeMultipleImg(set); 
// close result file 
fclose(file);
}

This function uses a OCR output handler that is defined now:

void myOutputHandler(int infotype,int param) 
{
int i; 
switch(infotype) 
{
case OT_TEXT:
fprintf(file,"%c",(char) param); 
break;
case OT_ENDL:
fprintf(file,"\\n"); 
break;
case OT_ENDZ:
fprintf(file,"\\n\\n");
}
}
2020-11-09T15:40:18-05:00Tags: , |

Using the SimpleOCR ActiveX Control (VB)

This is an example of how to perform OCR on a multi-page image file using SimpleOCR X and VB.

Sub DoOCR (filename as String) 
Basic function to perform OCR on a multi-page TIFF image file.
Dim objOCR as SimpleOCR 'SimpleOCR object 
Dim ret, img, imgSet, i as Long 'Function return value, single image pointer, multiple image pointer, page counter 
Dim strOCRResult as String 'String variable to hold OCR result 

Set objOCR = New SimpleOCR 

ret = objOCR.OCRSetOutputHandlerX(AddressOf myOutputHandler) 
if isnull(ret) then
'Error occurred
end if 

imgSet = objOCR.LoadMultipleImgX(filename) 
if isnull(imgSet) then
'Error occurred
end if 

objOCR.SetLanguageX(ENGLISH,".") 'Set language to English 
objOCR.SetOutputModeX(OM_TEXT) 'Set output mode to Text 

For i = 1 to objOCR.GetNBImagesX(imgSet)
img = objOCR.GetImage(imgSet,i) 'Get the current page 
ret = objOCR.OCRX(img,0) 'Perform OCR on page 
if ret > 0 then 
'Error occurred 
end if
Next 

MsgBox strOCRResult 

objOCR.FreeMultipleImgX img
End Sub 

Sub myoutputhandler(ByVal infotype As Integer, ByVal param As Integer) 
'This simple output handler sets the strOCRResult string to the OCR result.
'Return Value 
'None 

'Parameters 
'infotype - Contant indicating type of information contained in param. 
'param - data from OCR engine. See SetOCROutputHandler declaration for details. 

'Comments 
'Output handler must be declared in BAS modules and not form code 
'since the AddressOf method requires it for passing as a pointer 

On Error Resume Next 'required to avoid propagating DLL errors to VB 

Select Case infotype
Case OT_TEXT 
strOCRResult = strOCRResult + CStr(Chr(param)) 
Case OT_ENDL 
strOCRResult = strOCRResult + "\\n"
End Select
End Sub
2020-11-09T15:33:04-05:00Tags: , |

Document Scanning

One Source, Many Solutions

There are many document scanning solutions to choose from. ScanStore offers many of the top document imaging solutions under one virtual roof. ScanStore‘s CDIA+ consultants can work with you to explain the strengths and weaknesses of each option and even provide a demo of the products using samples that you provide.

You’ll find flexibility with each of these products allowing a one-person shop to jump right in, or scale up to enterprise or service bureau proportions. If you need to throw some data capture into the document imaging mix, ScanStore also carries OCR, forms processing and document management tools.

Information and Advice

Take a look at the Scanning Solutions Comparison page to find in-depth information on the features of the available offerings and for more insight in finding the best fit.

And be sure not to miss the detailed comparison of the favorite Batch Scanning solutions in the exclusive Document Scanning Software Review.

What’s Right for You

You want a paperless office and document scanning is part of the path to get you there. Simply buying a scanner and feeding paper into it isn’t going to save you money. Automation of the scanning process is what holds costs down and drives up your Return on Investment.

For example, if an OCR automation costs $3,000 to implement, but by doing so you save a $15/hr employee 10 hours per week of data entry, the feature has paid for itself in 20 weeks.

So how do we automate the data capture? Here are a few possibilities:

  • Full-Page OCR turns a scan into a full-text document you can search

  • Barcodes on each document contain key data like a customer name or invoice number

  • A single field […]

OCR Servers

Enterprise OCR servers let you perform Optical Character Recognition on thousands of documents at a time, scaling to meet the demands of the largest document conversions.

Traditional Desktop OCR applications require a person to load the scanned document, run the OCR process and save the output files. This makes sense when you are converting individual documents, but large organizations with thousands or millions of documents need something much more automated and scalable.

OCR Server processing workflow

Typical Enterprise OCR Applications

As the cost of OCR software and hardware goes down each year and the quality goes up, full-text search is included in more and more records management applications. Typical applications include:

  • Data mining
  • Litigation support
  • Full-text searching
  • Document management

Features of Enterprise OCR Servers

  • OCR is performed in the background without a user interface
  • Files are imported automatically from hotfolders
  • Ability to use multiple CPUs and servers for processing
  • Management tools for remote administration
  • Web service & API integration to submit OCR jobs

What is the Best OCR Server?

The ABBYY FineReader Server offers the best combination of features, performance and pricing. It has flexible licensing, including an unlimited CPU-based license that does not limit the number of pages processed.

Foxit PDF Compressor has the lowest entry level pricing, OmniPage OCR and unique PDF compression technology that can dramatically reduce the size of searchable PDF documents, leading to faster viewing and lowered cloud storage and bandwidth costs.

The SimpleIndex Server offers affordable unattended OCR services coupled with advanced data extraction and indexing capabilities that organizes documents automatically or saves metadata to Excel or a SQL database. It doesn’t have the scalability, API interfaces or compression technology that other OCR servers have, but you can bundle the Standard Server version with them to add indexing, […]

Compare OCR Software

OCR Guide

Optical Character Recognition

During your foray into the world of document scanning, you’ve likely encountered the term “OCR” and may even know that it stands for “Optical Character Recognition“. But what exactly is OCR and how can you make the best use of this sophisticated and valuable tool?

We’re here to give you a run-down of what you need to know about Optical Character Recognition, answer any questions you might have, and recommend the best OCR software solution for your scanning project. Let’s begin!

What is OCR?

The primary purpose of Optical Character Recognition is to quickly and automatically recognize and convert images of machine-printed or typed text into actual electronic data that users can organize, search, and modify. In general, an OCR engine analyzes the pixel data of scanned images and searches for patterns resembling letters, numbers, and other symbols to create a digitized record of characters. While the exact mechanics of this process can be complicated, OCR engines ultimately enable users to easily and effectively perform a wide array of functions such as information entry, processing, categorization, retrieval, and analysis.

Applications of OCR

Optical Character Recognition employs robust technology to digitally convert, recognize, and manage scanned paper and machine-readable documents promptly and accurately. Such reliable OCR capabilities power vital systems, facilitate essential services, improve routine operations, and promote overall efficiency. Two significant methods of such Optical Character Recognition are:

Full Page OCR – Converts the entire page into one of the following formats:

  • Plain Text – Basic text information on the page is retained in a consecutive order.
  • Formatted Text – Text information is retained in consecutive paragraphs while saving font size and style. This can also preserve tables in a tabular format, such as spreadsheets.
  • Exact Copy – All information on the page is retained, including graphics, and placed on the page in the […]

OCR Freeware


About SimpleOCR Freeware

Do you dread having to retype that document you are holding in your hand? If only you had the electronic file, your life would be so much easier. With SimpleOCR, you could easily and accurately convert that paper document into editable electronic text for use in any application including Word and WordPerfect.

Not only is SimpleOCR up to 99% accurate, it is 100% free.

Download SimpleOCR now or learn more its feature and functions.

Accuracy

With optical character recognition up to 99% accurate, there is no better OCR application for the price. This increased accuracy greatly reduces the need for post-recognition proof reading and correction. And after all, isn’t that why you want to OCR the document in the first place? Of course it is!

System Requirements

SimpleOCR works on any version of windows, from Windows 95-10 and beyond! Your scanner need only a TWAIN driver, the driver that comes with a majority of all scanners sold. In short, SimpleOCR will most likely work with the PC and scanner you already have.

Pricing

SimpleOCR is free for all commercial and non-commercial purposes. It may be re-distributed freely, but only in its original, unaltered form.

Download SimpleOCR Now

  • Huge Dictionary – With more than 120,000 words, it is unlikely that SimpleOCR will run into a word it does not know. In the rare event that it does, our improved text editor allows you to easily add the new word to the dictionary. By adding new words to the dictionary, SimpleOCR becomes better with every use.

  • Attention! SimpleOCR does NOT have any handprint OCR capabilities, it will not be able to recognize handwritten text. ICR (Intelligent Character Recognition) is rather complicated software usually on a more expensive side.

  • Despeckle – For those documents which are not […]

Go to Top