Software that can convert large batches of PDF documents to text with Optical Character Recognition (OCR), extract data and export it to Excel or SQL databases. Native PDF files can be processed without OCR with dramatic speed and 100% accuracy.

Foxit PDF Compressor

Foxit PDF Compressor is an OCR server equipped with enhanced compression that can dramatically reduce the size of PDF files. This can lead to big cost savings in cloud storage and bandwidth fees, and improved efficiency for knowledge workers who save time on every file they open.

Pricing starts at $375 for 12,000 pages per year. One-time and CPU-based licensing is also available. Contact us for pricing and bundling options for any page volume.

ABBYY FlexiCapture SDK On-Premise

ABBYY FlexiCapture SDK enables software developers to quickly create applications that extract meaning from documents. FlexiCapture SDK is ideal for system integrators, developers, and service providers who want to integrate powerful data capture capabilities into their solutions. Through the use of ABBYY’s machine learning and AI, end customers are able to process more transactions, faster, and with fewer errors, improving customer service, reducing costs, and making smarter process decisions.

FlexiCapture SDK, as a delivery option of the FlexiCapture platform, provides developers with a powerful and flexible toolkit to smoothly integrate ABBYY’s industry-leading data capture technologies to empower their own products and services according to vertical market needs.

Licensing is based on number of developers for the base SDK, then annual page counts for the perpetual license. All of the functionality supported by ABBYY FlexiCapture is included in the license, with a few exceptions.

ABBYY Cloud OCR SDK

ABBYY® Cloud OCR SDK is a web-based document processing service that will enhance your enterprise software systems, SaaS platforms, or your mobile apps with the ability to convert documents and utilize textual information from scans, PDFs, document images, smartphone photos, or screenshots.

Combining ABBYY’s latest AI-based technologies for information extraction with the highly scalable processing power of the Microsoft® Azure® computing infrastructure, this secure and reliable ABBYY cloud service can be easily integrated into your application via a REST API—empowering it to precisely convert virtually any number of pages within the shortest amount of time.

How to scan documents to searchable PDF files

Adobe Acrobat OCR to Searchable PDFIf you don’t already have a scanner, and scanning to searchable PDF files is the only thing you need to do, you will find many document scanners that can perform this function. Most desktop and high-speed document scanners come with software that has this basic capability. However these often have limited functionality and you may prefer a more robust application.

To create searchable PDFs with any scanner, use Desktop OCR software applications like FineReader, ReadIRIS, or OmniPage. These programs can also be used to convert images to MS Word, Excel, and other editable formats.

There are also more affordable PDF converters that have fewer OCR features and limit output to PDF files.

You can find a complete guide to OCR software here.

For high-volume applications, use OCR servers to give everyone on your network the ability to create searchable PDFs on a dedicated server.

Enterprise site licensing, concurrent user licensing and cloud-based solutions are also available. Please contact us for more information or a quote for desktop OCR and PDF converter site licensing options.

You may use SimpleIndex to automatically extract data from searchable PDFs for indexing, automatic file naming, and integration with custom database or document management applications. This is a very fast and accurate way to set keyword metadata for searching. It has both Tesseract and FineReader OCR options for creating searchable PDFs, and is available in desktop or server versions.

OCR Consulting Services

OCR Experts for Any Project

Our unique team of OCR experts are equipped to help out with OCR projects of any size or complexity. We have support specialists that can remotely configure desktop solutions in a matter of minutes and expert systems integrators with years of programming, database design, and robotic process automation experience.

Desktop OCR

Batch Document Scanning and OCRUse our online store to order desktop OCR applications and our staff will be happy to answer your setup questions via email or web chat.

Remote configuration and training services using GotoMeeting are available for a low hourly rate.

Let Us OCR That For You

Got a one-time conversion and don’t want to hassle with software? Upload your scanned document to us and we’ll send back the converted files. Optional verification service corrects recognition errors and layout issue for a low hourly rate.

Data processing for forms, reports, directories, and other documents is also available with output to CSV, Excel, XML, JSON, SQL, etc.

Contact us and if possible provide a sample, total pages, desired output and whether you want us to correct the results after OCR and we’ll reply back with a quote right away. Prices start at $50 for up to 1,000 pages.

Batch Scanning & OCR Servers

Automate document scanning and digital document archival processes using zone OCR, barcode recognition, database integration and other technologies.

Small business systems and single document workflows can be setup remotely via GotoMeeting, usually in just a few hours. Chat now if we’re online or leave a message to schedule a consultation.

Data Capture and Forms Processing

Data Capture Forms OCRAdvanced data extraction solutions that can turn the most complex documents into structured data ready […]

OCR Data Capture

What is OCR Data Capture?

document OCR process automationOCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Enterprise data capture systems provide interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Who can benefit from OCR data capture software?

messy business information made easy with ocr data captureAny organization that collects data from paper documents, or electronic files like PDF and Office documents, can get a very high return on investment by automating the data entry with OCR data capture software.

You do need to have a significant number of documents to […]

Why are the prices of OCR applications so different?

OCR software ranges in price from freeware all the way up to tens of thousands of dollars. What explains the difference between these applications? Here’s the breakdown:

  • OCR Freeware uses the SimpleOCR or Tesseract engines and provide limited scanning and output format capabilities. Recognition quality is generally poor except for the highest quality document images.
  • PDF OCR Converters provide good quality OCR engines like ABBYY, IRIS and OmniPage, but limit the output to searchable PDF files. These cost less than $100.
  • Standard OCR applications range from $100-$200 and provide full OCR capabilities including converting scans to Word, Excel, HTML and other editable formats.
  • Corporate OCR applications add advanced features like automated hotfolder processing, concurrent licensing and other features useful for business applications. Pricing for these is $200-$500.
  • OCR Servers provide scalable, enterprise OCR services for processing very high volumes of documents or providing OCR capabilities to users throughout the organization. Prices start around $1,500 and go up based on processing volume.
  • Enterprise Data Capture and Forms Processing applications are used to capture structured data from complex documents like healthcare claim forms and invoices that include things like tables, handwriting, checkboxes, and movable zones. These solutions can cost anywhere from around $1,000 to hundreds of thousands of dollars depending on the document volume and complexity of the project.

Using Artificial Intelligence to train OCR templates

Modern Forms Processing applications have AI-based training algorithms that let users point and click on the location of data in their documents and create OCR templates automatically.

This bypasses the technical requirements of creating complex OCR templates, especially for varied documents like Invoices where the data doesn’t always appear in the same place.

But how good are these AI-based training systems?

In our experience they work well when you have:

  • Good quality scanned images
  • Clearly labeled data
  • Tables with regular columns

Point and click style training doesn’t work quite as well with:

  • Poor quality images
  • Data that appears within paragraphs
  • Tables with overlapping columns, subtotal rows, etc.

These types of documents can still be captured with OCR but they will usually require an experienced technician to manually configure the template.

For natural language data like legal documents, a new artificial intelligence technology called NLP (Natural Language Processing) is available. These work by attempting to “understand” the language used in documents to interpret the location of data points based on meaning. ABBYY FlexiCapture also supports NLP-based training for these types of documents.

How to use Zone OCR when the data can be in different locations?

Modern Forms Processing software can use rules-based templates for locating data on documents based on label keywords, data types, regular expression pattern matching and other methods.

The most common example in business is an Invoice. Businesses receive invoices from 1000s of different vendors, each with important information like the Invoice Number, Due Date and Total needed to process the document, but each vendor invoice is formatted a little differently than the others.

Software like ABBYY FlexiCapture will look for keywords like “Invoice Number” or variations like “Inv #” and “Invoice No.” to locate the invoice number value on each invoice.

These applications are also able to capture complex table data and output to formats like Excel or a SQL Database, especially when it doesn’t line up into regular columns.

In recent years, artificial intelligence based training has made it possible to simply point and click on the location of data on documents as you process them and generate these templates automatically, dramatically reducing the need for ongoing expert help these systems require.

Knowledge Base

The SimpleOCR Knowledge Base contains frequently asked questions and answers, technical guides and general information on a broad range of optical character recognition, handprint recognition, data capture, PDF OCR, AP invoice scanning and zone OCR applications.

Contact Us for FREE Consultation on Your OCR Project

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud delivers ABBYY’s advanced data capture platform capabilities via REST API and web interfaces. ABBYY FlexiCapture Cloud customers can rapidly configure and deliver their Content IQ solution, taking advantage of our cloud services to automate and accelerate their document-driven processes. The advanced machine learning and AI in the platform improve classification and data extraction results, enabling core processes to support better, smarter, faster decisions.

FlexiCapture Cloud enables organizations to accelerate digital transformation by complementing their automation systems with new and advanced cognitive capabilities that liberate the intelligence locked in their documents.

ABBYY FlexiCapture On-Premise

ABBYY FlexiCapture On-Premise – Distributed – Perpetual License PPY 50K Pages

ABBYY FlexiCapture is a powerful data capture and document processing solution from a world-leading technology vendor. It is designed to transform streams of documents of any structure and complexity into business-ready data. And its award-winning recognition technologies, automatic document classification, plus a highly scalable and customizable architecture, mean that it can help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs.

SimpleIndex Barcode Suite

Simple Software SimpleIndex Product Suites offer you a better deal on bundles of essential products.

SimpleIndex Barcode Suite combines best Simple Software products to create a complete Barcode OCR solution. It includes:

  • SimpleIndex Barcode Server  license with built in Accusoft barcode engine and server functionality.
  • SimpleSend solution enables automated sending of document files via secure FTP or email. SimpleSend enhances the functionality of SimpleIndex in several ways as well as functioning as a standalone application.
  • SimpleExport license is designed to convert any delimited text file into any XML or formatted text file format using XSLT. It automates the process of applying XSLTs, especially for document imaging applications where the data has matching files that must be moved or renamed along with the data.
  • 5 licenses of SimpleCoversheet which is designed to work with data sources like SQL databases, spreadsheets and text files to dynamically build lists of barcodes to print. This is especially useful in document scanning applications where barcodes are used to identify and file documents automatically.

ABBYY FineReader Server On-Premise

ABBYY FineReader Server On-Premise

Innovative server-based OCR software for performing centralized enterprise-wide OCR processing. Allows anyone on the network to submit files for OCR. Complex XML job specifications can be submitted to control output. Support available for Arabic and Asian languages.

Available in CPU, Total Page Count and Pages Per Year licensing models.

SimpleIndex OCR Server 1M PPY

SimpleIndex  OCR Server 1 million pages per year – ABBYY FineReader OCR Server

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Professional

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Pro version Includes:

SimpleIndex Standard,

ISIS scanning,

FineReader OCR

Accusoft Barcode Upgrades

SimpleIndex Standard

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Standard version
Includes:

basic text and barcode recognition,

TWAIN scanning

Forms Processing

What is ICR, Survey & Forms Processing?

ICR stands for Intelligent Character Recognition and is the technology that allows software to interpret hand printed text on scanned images.

Data Capture Forms OCRForms Processing Software uses ICR technology to automate data entry tasks involving hand-filled surveys, applications and forms. It provides interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Forms Processing also includes OCR (Optical Character Recognition) technology to recognize machine printed text, and OMR (Optical Mark Recognition) for check boxes and multiple choice bubbles.

Who can benefit from handprint recognition software?

Any organization that collects data on paper-based forms, surveys or applications on a regular basis can get a very high return on investment by automating the data entry with forms processing software.

You do need to have a significant number of forms to justify the expense– at least a hundred forms per month or more depending on how much data is being captured. If the data entry task can be done in under 100 man-hours then it is not a good candidate for automation with ICR software.

Organizations that have many separate departments that collect data on forms can share the budget for forms processing software by re-using it for other projects. Your current project may not be big enough to justify the expense, but when combined with one or two others it would be.

How much do Survey & Forms Processing systems cost?

The total cost of a forms processing solution includes several items:

  • Cost of the software
  • Time to install and configure the software
  • Forms may need to be redesigned for optimal recognition
  • Recognition templates must be created for each data field on every type of form
  • Data exports must […]

Document Scanning

One Source, Many Solutions

There are many document scanning solutions to choose from. ScanStore offers many of the top document imaging solutions under one virtual roof. ScanStore‘s CDIA+ consultants can work with you to explain the strengths and weaknesses of each option and even provide a demo of the products using samples that you provide.

You’ll find flexibility with each of these products allowing a one-person shop to jump right in, or scale up to enterprise or service bureau proportions. If you need to throw some data capture into the document imaging mix, ScanStore also carries OCR, forms processing and document management tools.

Information and Advice

Take a look at the Scanning Solutions Comparison page to find in-depth information on the features of the available offerings and for more insight in finding the best fit.

And be sure not to miss the detailed comparison of the favorite Batch Scanning solutions in the exclusive Document Scanning Software Review.

What’s Right for You

You want a paperless office and document scanning is part of the path to get you there. Simply buying a scanner and feeding paper into it isn’t going to save you money. Automation of the scanning process is what holds costs down and drives up your Return on Investment.

For example, if an OCR automation costs $3,000 to implement, but by doing so you save a $15/hr employee 10 hours per week of data entry, the feature has paid for itself in 20 weeks.

So how do we automate the data capture? Here are a few possibilities:

  • Full-Page OCR turns a scan into a full-text document you can search

  • Barcodes on each document contain key data like a customer name or invoice number

  • A single field […]

OCR Servers

Enterprise OCR servers let you perform Optical Character Recognition on thousands of documents at a time, scaling to meet the demands of the largest document conversions.

Traditional Desktop OCR applications require a person to load the scanned document, run the OCR process and save the output files. This makes sense when you are converting individual documents, but large organizations with thousands or millions of documents need something much more automated and scalable.

OCR Server processing workflow

Typical Enterprise OCR Applications

As the cost of OCR software and hardware goes down each year and the quality goes up, full-text search is included in more and more records management applications. Typical applications include:

  • Data mining
  • Litigation support
  • Full-text searching
  • Document management

Features of Enterprise OCR Servers

  • OCR is performed in the background without a user interface
  • Files are imported automatically from hotfolders
  • Ability to use multiple CPUs and servers for processing
  • Management tools for remote administration
  • Web service & API integration to submit OCR jobs

What is the Best OCR Server?

The ABBYY FineReader Server offers the best combination of features, performance and pricing. It has flexible licensing, including an unlimited CPU-based license that does not limit the number of pages processed.

Foxit PDF Compressor has the lowest entry level pricing, OmniPage OCR and unique PDF compression technology that can dramatically reduce the size of searchable PDF documents, leading to faster viewing and lowered cloud storage and bandwidth costs.

The SimpleIndex Server offers affordable unattended OCR services coupled with advanced data extraction and indexing capabilities that organizes documents automatically or saves metadata to Excel or a SQL database. It doesn’t have the scalability, API interfaces or compression technology that other OCR servers have, but you can bundle the Standard Server version with them to add indexing, […]

Batch OCR Software

Batch OCR for Full-Text Conversion & Searchable PDF

Batch OCR PDF to Text, Excel, WordThe primary purpose of Optical Character Recognition is to quickly and automatically convert scanned images of machine-printed (typed) text into actual text data that you can search through and modify.

Batch OCR software allows for the conversion of multiple files at once, usually through a hot folder or watched email inbox method that converts any files added to a particular folder.

The ability to watch a hotfolder and automatically convert documents is included in the complete versions of desktop OCR products, like FineReader Corporate, OmniPage Ultimate or ReadIRIS Corporate.

While automatic processing is available in these applications, they are not designed for true server-based processing since the application has to be running on the user’s desktop. OCR servers are designed for unattended batch OCR processing and high-volume applications that require multiple CPUs and processing workflows.

Those applications are all designed for traditional, full-page OCR conversions to text, Word, Excel, or searchable PDF documents.

Batch OCR for Data Capture

Forms Processing OCR Data CaptureOCR Data Capture systems are designed to read specific data points from documents and output structured data like CSV, XML, JSON or SQL databases. SimpleIndex, FlexiCapture and PaperVision Capture all offer batch zone OCR as well as advanced features like AI-based training, invoice processing and line items.

OCR Experts At Your Service

Our OCR experts can help you find the batch OCR software that is right for your project, as well as providing remote installation, setup, training and support that’s not available for most desktop OCR applications. We can also help with enterprise implementations, custom API integrations, RPA or […]

PDF OCR

Searchable PDF OCR

Adobe Acrobat OCR to Searchable PDFCreating searchable PDF files using optical character recognition is one of the most common PDF OCR applications.

The PDF format works great with scanned documents because it allows the OCR text to be hidden in an invisible layer behind the original document image. So you see a perfect replica of the original instead of OCR text that lacks formatting and may contain artifacts and errors.

OCR PDF to Other Formats

Batch OCR PDF to Text, Excel, Word

PDF OCR can also mean converting scanned PDF files to Word, Excel, text and other formats. This can be done with any desktop OCR or OCR server application. However there are several OCR applications called PDF Converters that are only designed to convert documents to searchable PDF files rather than converting PDF files to other formats. This is an important distinction to make when searching for PDF OCR software.

PDF Converters often cost less than their full-featured desktop OCR counterparts since they only offer document scanning and conversion of images to searchable PDF files. They can also include the ability to convert other file formats like Word, Excel, PowerPoint, HTML, etc. to PDF automatically. Enterprise site licensing options let you enable this capability for any user in the organization. Contact us for a quote on site licenses for any PDF OCR application.

PDF OCR Compression

PDF also offers advanced compression options like MRC, JPEG2000 and JBIG that can produce much smaller files than traditional TIFF images. Foxit PDF Compressor is even able to parse the document and apply different compression to images, text and backgrounds to reduce the size even further. This can produce huge savings in cloud storage and access […]

OCR Guide

Optical Character Recognition

During your foray into the world of document scanning, you’ve likely encountered the term “OCR” and may even know that it stands for “Optical Character Recognition“. But what exactly is OCR and how can you make the best use of this sophisticated and valuable tool?

We’re here to give you a run-down of what you need to know about Optical Character Recognition, answer any questions you might have, and recommend the best OCR software solution for your scanning project. Let’s begin!

What is OCR?

The primary purpose of Optical Character Recognition is to quickly and automatically recognize and convert images of machine-printed or typed text into actual electronic data that users can organize, search, and modify. In general, an OCR engine analyzes the pixel data of scanned images and searches for patterns resembling letters, numbers, and other symbols to create a digitized record of characters. While the exact mechanics of this process can be complicated, OCR engines ultimately enable users to easily and effectively perform a wide array of functions such as information entry, processing, categorization, retrieval, and analysis.

Applications of OCR

Optical Character Recognition employs robust technology to digitally convert, recognize, and manage scanned paper and machine-readable documents promptly and accurately. Such reliable OCR capabilities power vital systems, facilitate essential services, improve routine operations, and promote overall efficiency. Two significant methods of such Optical Character Recognition are:

Full Page OCR – Converts the entire page into one of the following formats:

  • Plain Text – Basic text information on the page is retained in a consecutive order.
  • Formatted Text – Text information is retained in consecutive paragraphs while saving font size and style. This can also preserve tables in a tabular format, such as spreadsheets.
  • Exact Copy – All information on the page is retained, including graphics, and placed on the page in the […]
Go to Top