Extract text from image files with Optical Character Recognition (OCR) and convert it to any file format or SQL database. OCR software for any size project or budget.

How to use Zone OCR when the data can be in different locations?

Modern Forms Processing software can use rules-based templates for locating data on documents based on label keywords, data types, regular expression pattern matching and other methods.

The most common example in business is an Invoice. Businesses receive invoices from 1000s of different vendors, each with important information like the Invoice Number, Due Date and Total needed to process the document, but each vendor invoice is formatted a little differently than the others.

Software like ABBYY FlexiCapture will look for keywords like “Invoice Number” or variations like “Inv #” and “Invoice No.” to locate the invoice number value on each invoice.

These applications are also able to capture complex table data and output to formats like Excel or a SQL Database, especially when it doesn’t line up into regular columns.

In recent years, artificial intelligence based training has made it possible to simply point and click on the location of data on documents as you process them and generate these templates automatically, dramatically reducing the need for ongoing expert help these systems require.

Using Artificial Intelligence to train OCR templates

Modern Forms Processing applications have AI-based training algorithms that let users point and click on the location of data in their documents and create OCR templates automatically.

This bypasses the technical requirements of creating complex OCR templates, especially for varied documents like Invoices where the data doesn’t always appear in the same place.

But how good are these AI-based training systems?

In our experience they work well when you have:

  • Good quality scanned images
  • Clearly labeled data
  • Tables with regular columns

Point and click style training doesn’t work quite as well with:

  • Poor quality images
  • Data that appears within paragraphs
  • Tables with overlapping columns, subtotal rows, etc.

These types of documents can still be captured with OCR but they will usually require an experienced technician to manually configure the template.

For natural language data like legal documents, a new artificial intelligence technology called NLP (Natural Language Processing) is available. These work by attempting to “understand” the language used in documents to interpret the location of data points based on meaning. ABBYY FlexiCapture also supports NLP-based training for these types of documents.

Using OCR to capture data from tables and reports

Data that repeats over and over again in a document can be OCR’d to Microsoft Excel, Google Sheets and other spreadsheet formats, or a SQL Database like Access, SQL Server, MySQL and Oracle.

Inexpensive Desktop OCR products like FineReader, ReadIRIS and OmniPage can automatically convert data from tables to Excel and other spreadsheets, as long as the columns are standard and don’t “overlap” such that different field values appear in the same column area, like when one row of each record represents one set of columns and a second row has additional column data.

Converted data will require some clean-up before it is usable in any database or software application, and it is difficult to convert large numbers of documents in batches this way. But it’s a good way to produce structured data from large single reports or small batches of similar report data.

For more complex tables, tables with similar data but different formats on different documents (like Invoices), tables with nested structure like header and detail rows, Enterprise Forms Processing software is required to turn these documents into structured data like XML, JSON or SQL database tables.

Why are the prices of OCR applications so different?

OCR software ranges in price from freeware all the way up to tens of thousands of dollars. What explains the difference between these applications? Here’s the breakdown:

  • OCR Freeware uses the SimpleOCR or Tesseract engines and provide limited scanning and output format capabilities. Recognition quality is generally poor except for the highest quality document images.
  • PDF OCR Converters provide good quality OCR engines like ABBYY, IRIS and OmniPage, but limit the output to searchable PDF files. These cost less than $100.
  • Standard OCR applications range from $100-$200 and provide full OCR capabilities including converting scans to Word, Excel, HTML and other editable formats.
  • Corporate OCR applications add advanced features like automated hotfolder processing, concurrent licensing and other features useful for business applications. Pricing for these is $200-$500.
  • OCR Servers provide scalable, enterprise OCR services for processing very high volumes of documents or providing OCR capabilities to users throughout the organization. Prices start around $1,500 and go up based on processing volume.
  • Enterprise Data Capture and Forms Processing applications are used to capture structured data from complex documents like healthcare claim forms and invoices that include things like tables, handwriting, checkboxes, and movable zones. These solutions can cost anywhere from around $1,000 to hundreds of thousands of dollars depending on the document volume and complexity of the project.

FlexiCapture and Vantage Natural Language Processing (NLP)

How to train NLP machine learning model

Today different industries face similar challenges as they seek to extract information from business documents, such as policies, e-mails and legal agreements – and most agree that is costly, time consuming and prone to errors with manual data entry.

In this video you will learn how to train NLP machine learning model in FlexiCapture to extract entities and text passages from Lease agreements.

Converting unstructured documents into structured data automatically makes this information available to your business applications while saving you time, money, and labor in the process.

 

Adding a field which is captured by flexilayout to a NLP-trained Document Definition

You can add the new flexible layout as additional layout to the existing one.
To do that, please open the Document Definition Editor, go to the Section’s properties and load the new layout as additional FlexiLayout.

Using FlexiLayout Studio to Design Data Capture Templates

FlexiLayout: How to capture a table using Repeating Group if table header is on each page

In some cases, we might have a table that we are not able to capture correctly using a traditional method – Table element. In such cases, we usually use Repeating Group element.

But what if we come across a multi-page document that has a table header on each page?

mceclip0.png

We can use two following methods to capture such a table using the Repeating Groups.

Using Absolute search area constraints

To limit the search area to the table area so that it doesn’t capture unnecessary text outside of the table, we can use Absolute search area constraints in the Search Constraints tab.

You can measure the area with the Measure Rectangle tool.

mceclip0.png

Using nested Repeating groups

Sometimes it might be not suitable to use the Absolute search area constraints method because other tables using this layout might have different positions and lengths of elements, thus making it not convenient to use the method, because you will have to re-measure the area every single time.

In such a case, you can use the nested Repeating group method.

  1. Create the first, “main” Repeating group that will include the Table header and footer. mceclip1.png
  2. Next, create the nested RG in the first RG. The relations are as follows: mceclip2.png
  3. These are the main steps, other elements in the RG don’t need any specific settings and should be designed according to the needed results.

Additional information

FlexiLayout: Capturing a table using Repeating Group

 

How to reliably capture elements in FlexiLayout Studio if the image resolution can vary

When the image resolution varies, then the search area of elements based on absolute offsets can miss […]

PDF Processing with FineReader and FineReader Server

How to create a PDF from Microsoft® Word, Excel, or PowerPoint

 

How to convert emails to PDF

 

How to Split a PDF

Create new PDF documents or separate PDF documents combined in one easily with FineReader PDF 15.

Learn how to split PDFs and extract pages easily.

 

 

How to create and edit interactive PDF forms

Watch this video and see how to edit and create interactive PDF forms quickly and easily.

Form Editor tool in FineReader PDF 15 allows creating and editing fillable PDF forms with text and date fields, dropdown lists, list boxes, checkmarks, radio buttons, signature fileds and action buttons. Collect information and create effective document templates with ease!

 

How to extract text from scanned PDFs

 

 

How to extract tables

 

 

How can I verify if the digital signature is valid?

If you open a document with a valid digital signature in FineReader, you will see a green notification Valid on the left panel of ABBYY FineReader PDF 15:
 mceclip0.png

Recognizing a document with existing text layer in FineReader PDF 15

  1. Open FineReader PDF 15;
  2. Go to Tools > Options > OCR;
  3. In the PDF recognition mode select Use OCR option:
  4.  Click OK;
  5.  Recognize your document again.

 

 

How to convert a document into an accessible PDF/UA

Make your mixed documents—PDF, scanned, photographed, or papers— digital and accessible.

In this […]

Compare OCR Software

Convert Scanned Image to Text Document

The primary purpose of Optical Character Recognition is to quickly and automatically convert scanned images of machine-printed (typed) text – which to a computer are no more meaningful a collection of pixels than any other image, such as a landscape photo – into actual text data that you can search through and modify.

OCR Software comes in many different types, which vary in price range based on their features, speed, and accuracy. One of the main qualities that OCR producers are using to differentiate their products is volume of the documents OCR will allow you to process. That may be a bit counter intuitive but features that are needed to process hundreds, thousands or millions pages a year are rather different ones.

In case of several hundreds of pages (receipts, checks, medical, tax or legal forms, personal memorabilia)  you need to scan for personal use you would need light, highly versatile, easy to use, not expensive software that will convert images just to text. It may not have automation features, and processing data further will be done manually by you. Thou it is not too hard since volume of documents is not very large and you can treat each of them individually.

Small business users usually process thousands of pages a year and require some automation features. Images need to be converted not just to text, but also to spreadsheets to be processed further. Once the system is set up it is assumed that it will run without much of the interference, and people in charge of document processing would be able to do that with certain ease.

Larger companies processing millions of documents require much larger levels of automation when each small, fine tuned feature would save thousands of work hours in a long run. Multiple machines will be processing documents […]

OCR to Database

Data is Everything. It does not matter in what field your company works, after all everything will be distilled into digits of data and accumulated in Database to be processed, stored, repurposed and reassembled again, again and again.  All organizations have database, that acts as a repository for all of their information. And you may survive with manual data entry, or using spreadsheets or just folders with documents for some time, but eventually just mare amount of Data will become overwhelming.

Luckily there are plenty of solutions for your Database. You can choose between SQL (MySQL, Access, Postgres, …) or NoSQL (Mongo, AWS, …) solutions for storing and processing Data, but there will be always an issue of how raw unprocessed digits get from images or texts into more structured form of your Database. Identifying and transferring all of this data can be a bit of a task. Misreading data or mismatching of data to fields could easily ruin your data processing system. Thus, precision of data character recognition becomes essential.

One of the solutions is to keep these processes of scanning and data transferring separate. You can use one software for character recognition and transferring data from image to PDF or text document. And then to use PDF (or text) to database converters to extract that data into your database format. The very obvious disadvantage of this approach is that it adds the whole extra step into your data processing. You will start accumulate additional errors, will add time for setting up additional conversion, will add time to data processing and will add time for inevitable error identifying and bug fixing. It may work for smaller companies, but on larger enterprise level it becomes cost prohibitive.

Another solution is OCR to Database direct approach. […]

OCR Consulting Services

OCR Experts for Any Project

Our unique team of OCR experts are equipped to help out with OCR projects of any size or complexity. We have support specialists that can remotely configure desktop solutions in a matter of minutes and expert systems integrators with years of programming, database design, and robotic process automation experience.

Desktop OCR

Batch Document Scanning and OCRUse our online store to order desktop OCR applications and our staff will be happy to answer your setup questions via email or web chat.

Remote configuration and training services using GotoMeeting are available for a low hourly rate.

Let Us OCR That For You

Got a one-time conversion and don’t want to hassle with software? Upload your scanned document to us and we’ll send back the converted files. Optional verification service corrects recognition errors and layout issue for a low hourly rate.

Data processing for forms, reports, directories, and other documents is also available with output to CSV, Excel, XML, JSON, SQL, etc.

Contact us and if possible provide a sample, total pages, desired output and whether you want us to correct the results after OCR and we’ll reply back with a quote right away. Prices start at $50 for up to 1,000 pages.

Batch Scanning & OCR Servers

Data Capture Forms OCRAutomate document scanning and digital document archival processes using zone OCR, barcode recognition, database integration and other technologies.

Small business systems and single document workflows can be setup remotely via GotoMeeting, usually in just a few hours. Chat now if we’re online or leave a message to schedule a consultation.

Data Capture and Forms Processing

Advanced data extraction solutions that can turn the most complex documents into structured data ready […]

ABBYY FineReader PDF 15 Standard

FineReader PDF 15 Standard is a PDF software application for working with PDF documents and scans. Powered by ABBYY’s AI-based OCR technology it allows you to convert and edit not only digital PDF documents, but also scanned paper documents with the same ease-of-use. With FineReader PDF you can view, edit, search, comment, sign, protect, extract text from PDFs and convert documents into Word, Excel® for further editing.

 

Click Here to Download a Demo

ABBYY FineReader PDF 15 Corporate

ABBYY FineReader PDF 15 Corporate is an all-in-one business toolset for working with PDFs and document digitization. With FineReader PDF employees can work with both digitally created and scanned paper documents to fulfill various document-related tasks in the digital workplace effortlessly. ABBYY FineReader PDF 15 Corporate allows you to view, edit, search, comment and collaborate, sign and protect PDFs or compare document versions in different file formats to identify differences efficiently. Thanks to the seamlessly integrated AI-based OCR technology with FineReader you can also extract information from a PDF or convert the entire document to Word, Excel® for further editing. Document conversion can also be automated to prepare multiple documents for further processing.

Kofax OmniPage – Standard

Kofax OmniPage Standard converts paper, picture, and PDF files into editable documents to save you considerable time and money by eliminating retyping. Your documents look just like the original – complete with text, tables, and graphics. OmniPage uses superior character accuracy to precisely format your documents so you can easily make changes.

Kofax OmniPage – Ultimate

Kofax OmniPage Ultimate has several unique features that make it stand out for a variety of applications. Some of these include auto-redaction, SharePoint integration, automatic filing with barcodes, PDF auto-bookmarking, form data collection and MFP support. Most of these new features are not available in the Standard edition.

ABBYY Cloud OCR SDK

ABBYY® Cloud OCR SDK is a web-based document processing service that will enhance your enterprise software systems, SaaS platforms, or your mobile apps with the ability to convert documents and utilize textual information from scans, PDFs, document images, smartphone photos, or screenshots.

Combining ABBYY’s latest AI-based technologies for information extraction with the highly scalable processing power of the Microsoft® Azure® computing infrastructure, this secure and reliable ABBYY cloud service can be easily integrated into your application via a REST API—empowering it to precisely convert virtually any number of pages within the shortest amount of time.

Title

Go to Top