Posts and articles addressing OCR accuracy and how to improve OCR results with optimal scanner settings, recognition parameters, and OCR-friendly document design.
What is OCR?
OCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.
Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:
- Handprint recognition (ICR or Intelligent Character Recognition) for forms processing.
- Advanced rules-based templates for locating common data elements on pages with different layouts and formatting.
- Artificial intelligence that is able to use point and click user feedback to train recognition templates automatically.
- Natural language processing is able to interpret paragraphs of text and extract meaningful data from them.
- Robotic process automation puts back office integration into the hands of power users instead of programmers.
- Preconfigured form templates and business rules for common applications like invoice processing and healthcare claim forms.
Using the OCR software enables enterprises to reduce the document processing time by as much as 80%