Posts and articles addressing OCR accuracy and how to improve OCR results with optimal scanner settings, recognition parameters, and OCR-friendly document design.
Most OCR applications are optimized for 300 dots per inch resolution images.
While color is supported and most often performs better than black & white images, OCR algorithms will generally convert the color to B&W automatically as part of the OCR process. With color input, the dynamic conversion usually produces the best result, but not always.
Especially when an image contains stray markings, stamps, notes, colored paper or other elements that can throw off the binarization process, OCR results can be improved by paying careful attention to image processing settings and using a pristine black & white image for OCR instead of a color scan.
In forms processing and handprint recognition applications, guide marks in the form can often be removed during the scanning process, improving the OCR results when the software doesn’t have to distinguish between the form background and the words being recognized.
Using drop-out forms, traditionally printed in red or green and then scanned with a corresponding red or green light, automatically removes the form background during scanning and leaves only the text to be recognized. This can dramatically improve recognition results, especially for handprinted data.
Older, black & white scanners would require you to change out the lamps in order to perform color drop-out. All but the least expensive modern color scanners have the ability to enable drop-out colors in the scanner driver.
Advanced forms processing applications can perform color drop-out on-the-fly with scanned color images. Though this is generally not quite as accurate as scanning with a drop-out lamp enabled, it has the advantage of retaining a full-color original copy of the image with the form element and labels visible.