The primary purpose of Optical Character Recognition is to quickly and automatically convert scanned images of machine-printed (typed) text – which to a computer are no more meaningful a collection of pixels than any other image, such as a landscape photo – into actual text data that you can search through and modify.

Batch OCR software is a form of Optical Character Recognition software that allows for the conversion of multiple files at once, usually through a hot folder or watched folder method that converts any files added to a particular folder on your computer on a preset schedule. It also includes OCR servers, which can split the workload among more than one machine.

SimpleIndex Cloud OCR

SimpleIndex Cloud OCR adds Amazon AWS Textract OCR to any SimpleIndex workstation or server license.

Textract capabilities include the most accurate OCR and handprint recognition available, automatic form field detection, accounts payable invoice and receipt processing.

Amazon Textract is only available as an API that requires custom programming to make it work. SimpleIndex turns it into a complete document and data capture application designed for easy batch processing on a workstation or server.

Requires an AWS account. Standard Textract transaction fees will apply.

PDF Processing with FineReader and FineReader Server

How to create a PDF from Microsoft® Word, Excel, or PowerPoint

 

How to convert emails to PDF

 

How to Split a PDF

Create new PDF documents or separate PDF documents combined in one easily with FineReader PDF 15.

Learn how to split PDFs and extract pages easily.

 

 

How to create and edit interactive PDF forms

Watch this video and see how to edit and create interactive PDF forms quickly and easily.

Form Editor tool in FineReader PDF 15 allows creating and editing fillable PDF forms with text and date fields, dropdown lists, list boxes, checkmarks, radio buttons, signature fileds and action buttons. Collect information and create effective document templates with ease!

 

How to extract text from scanned PDFs

 

 

How to extract tables

 

 

How can I verify if the digital signature is valid?

If you open a document with a valid digital signature in FineReader, you will see a green notification Valid on the left panel of ABBYY FineReader PDF 15:
 mceclip0.png

Recognizing a document with existing text layer in FineReader PDF 15

  1. Open FineReader PDF 15;
  2. Go to Tools > Options > OCR;
  3. In the PDF recognition mode select Use OCR option:
  4.  Click OK;
  5.  Recognize your document again.

 

 

How to convert a document into an accessible PDF/UA

Make your mixed documents—PDF, scanned, photographed, or papers— digital and accessible.

In this […]

Using FlexiLayout Studio to Design Data Capture Templates

FlexiLayout: How to capture a table using Repeating Group if table header is on each page

In some cases, we might have a table that we are not able to capture correctly using a traditional method – Table element. In such cases, we usually use Repeating Group element.

But what if we come across a multi-page document that has a table header on each page?

mceclip0.png

We can use two following methods to capture such a table using the Repeating Groups.

Using Absolute search area constraints

To limit the search area to the table area so that it doesn’t capture unnecessary text outside of the table, we can use Absolute search area constraints in the Search Constraints tab.

You can measure the area with the Measure Rectangle tool.

mceclip0.png

Using nested Repeating groups

Sometimes it might be not suitable to use the Absolute search area constraints method because other tables using this layout might have different positions and lengths of elements, thus making it not convenient to use the method, because you will have to re-measure the area every single time.

In such a case, you can use the nested Repeating group method.

  1. Create the first, “main” Repeating group that will include the Table header and footer. mceclip1.png
  2. Next, create the nested RG in the first RG. The relations are as follows: mceclip2.png
  3. These are the main steps, other elements in the RG don’t need any specific settings and should be designed according to the needed results.

Additional information

FlexiLayout: Capturing a table using Repeating Group

 

How to reliably capture elements in FlexiLayout Studio if the image resolution can vary

When the image resolution varies, then the search area of elements based on absolute offsets can miss […]

Barcode Recognition in ABBYY FineReader & FlexiCapture

Recognition of Barcodes in ABBYY technologies

ABBYY technology and products can read different barcode types.
The Document Analysis algorithms are able to locate and identify different barcodes on a document page, but of course it is also possible to “draw” a barcode block also via API.
Once the barcode region is defined/detected, it can be recognised. The API provides access to:
  • the coordinates
  • the characters
  • character confidence information
  • start/stop symbols of different barcode types,
    for barcodes of type Code 39 the start/stop symbol is the asterisk “*”
  • The barcode value can then be used for file naming.
A very common scenario is document separation based on barcodes.
    • This feature is implemented in FlexiCapture projects
    • With FineReader Engine, the developers can “cut” the page stream with custom code
    • Separation in FineReader Server
    • Separation in the ABBYY Scan Station (FineReader Server & FlexiCapture)

Tips for working with barcodes

Barcode recognition quality depends on:

  • the barcode print quality
  • settings used in the document scanning process
  • Placement of the barcode when it is manually added

In order for the barcodes to be recognized well, follow these recommendations:

  • A barcode must be separated from other text by a fairly wide white gap.
  • Barcode size and the width of its separate bars or dots must meet the following requirements:
    • The optimal barcode height is more than 10 millimetres. The size of a barcode should be less than A4 size.
    • Barcode height must be higher than the double height of a text line
    • For not-square barcodes, their […]

ABBYY FlexiCapture SDK On-Premise

ABBYY FlexiCapture SDK enables software developers to quickly create applications that extract meaning from documents. FlexiCapture SDK is ideal for system integrators, developers, and service providers who want to integrate powerful data capture capabilities into their solutions. Through the use of ABBYY’s machine learning and AI, end customers are able to process more transactions, faster, and with fewer errors, improving customer service, reducing costs, and making smarter process decisions.

FlexiCapture SDK, as a delivery option of the FlexiCapture platform, provides developers with a powerful and flexible toolkit to smoothly integrate ABBYY’s industry-leading data capture technologies to empower their own products and services according to vertical market needs.

Licensing is based on number of developers for the base SDK, then annual page counts for the perpetual license. All of the functionality supported by ABBYY FlexiCapture is included in the license, with a few exceptions.

ABBYY Cloud OCR SDK

ABBYY® Cloud OCR SDK is a web-based document processing service that will enhance your enterprise software systems, SaaS platforms, or your mobile apps with the ability to convert documents and utilize textual information from scans, PDFs, document images, smartphone photos, or screenshots.

Combining ABBYY’s latest AI-based technologies for information extraction with the highly scalable processing power of the Microsoft® Azure® computing infrastructure, this secure and reliable ABBYY cloud service can be easily integrated into your application via a REST API—empowering it to precisely convert virtually any number of pages within the shortest amount of time.

Robotic Process Automation

Introducing Robotic Process Automation

RPA stands for Robotic Process Automation and it represents a new approach to business automation that helps minimize the technical hurdles required for implementing new workflows.

Robotic Process Automation of Data Entry

Traditional business process automations rely on application programming interfaces (APIs) to allow systems to exchange data. This approach has two main drawbacks:

  1. The application vendor must make those APIs available
  2. A programmer needs to write custom code to interface with them

If your software vendor does not provide an interface for consuming the data you need to automate, then you’re out of luck. And even if they do, the development costs can eliminate the ROI if the transaction volume isn’t large enough.

RPA tools avoid the API problem by interfacing directly with the application user interface just like a human would do. They use artificial intelligence and machine learning to “watch” the operator perform a task within the application then creates its own program (called a “bot”) to mimic it. This means that:

  1. Bots can do anything a human can do within the application
  2. Users can create a bot without writing code

Practically speaking, an experienced robotic process automation consultant with programming experience is required to roll out an RPA solution enterprise-wide, and most users will only be able to automate small, routine tasks without assistance. Business-critical, high-volume automations will still involve coding. But RPA dramatically reduces the implementation time and avoids the need to retrofit APIs for software applications that were not designed to support them.

Using RPA with OCR Data Capture

UiPath Robotic Process Automation RPA OCROCR Data Capture is one of the most common business processes to automate with RPA. Taking data stored in paper or electronic documents and […]

OCR Consulting Services

OCR Experts for Any Project

Our unique team of OCR experts are equipped to help out with OCR projects of any size or complexity. We have support specialists that can remotely configure desktop solutions in a matter of minutes and expert systems integrators with years of programming, database design, and robotic process automation experience.

Desktop OCR

Batch Document Scanning and OCRUse our online store to order desktop OCR applications and our staff will be happy to answer your setup questions via email or web chat.

Remote configuration and training services using GotoMeeting are available for a low hourly rate.

Let Us OCR That For You

Got a one-time conversion and don’t want to hassle with software? Upload your scanned document to us and we’ll send back the converted files. Optional verification service corrects recognition errors and layout issue for a low hourly rate.

Data processing for forms, reports, directories, and other documents is also available with output to CSV, Excel, XML, JSON, SQL, etc.

Contact us and if possible provide a sample, total pages, desired output and whether you want us to correct the results after OCR and we’ll reply back with a quote right away. Prices start at $50 for up to 1,000 pages.

Batch Scanning & OCR Servers

Automate document scanning and digital document archival processes using zone OCR, barcode recognition, database integration and other technologies.

Small business systems and single document workflows can be setup remotely via GotoMeeting, usually in just a few hours. Chat now if we’re online or leave a message to schedule a consultation.

Data Capture and Forms Processing

Data Capture Forms OCRAdvanced data extraction solutions that can turn the most complex documents into structured data ready […]

OCR Data Capture

What is OCR Data Capture?

document OCR process automationOCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Enterprise data capture systems provide interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Who can benefit from OCR data capture software?

messy business information made easy with ocr data captureAny organization that collects data from paper documents, or electronic files like PDF and Office documents, can get a very high return on investment by automating the data entry with OCR data capture software.

You do need to have a significant number of documents to […]

Why are the prices of OCR applications so different?

OCR software ranges in price from freeware all the way up to tens of thousands of dollars. What explains the difference between these applications? Here’s the breakdown:

  • OCR Freeware uses the SimpleOCR or Tesseract engines and provide limited scanning and output format capabilities. Recognition quality is generally poor except for the highest quality document images.
  • PDF OCR Converters provide good quality OCR engines like ABBYY, IRIS and OmniPage, but limit the output to searchable PDF files. These cost less than $100.
  • Standard OCR applications range from $100-$200 and provide full OCR capabilities including converting scans to Word, Excel, HTML and other editable formats.
  • Corporate OCR applications add advanced features like automated hotfolder processing, concurrent licensing and other features useful for business applications. Pricing for these is $200-$500.
  • OCR Servers provide scalable, enterprise OCR services for processing very high volumes of documents or providing OCR capabilities to users throughout the organization. Prices start around $1,500 and go up based on processing volume.
  • Enterprise Data Capture and Forms Processing applications are used to capture structured data from complex documents like healthcare claim forms and invoices that include things like tables, handwriting, checkboxes, and movable zones. These solutions can cost anywhere from around $1,000 to hundreds of thousands of dollars depending on the document volume and complexity of the project.

Using Artificial Intelligence to train OCR templates

Modern Forms Processing applications have AI-based training algorithms that let users point and click on the location of data in their documents and create OCR templates automatically.

This bypasses the technical requirements of creating complex OCR templates, especially for varied documents like Invoices where the data doesn’t always appear in the same place.

But how good are these AI-based training systems?

In our experience they work well when you have:

  • Good quality scanned images
  • Clearly labeled data
  • Tables with regular columns

Point and click style training doesn’t work quite as well with:

  • Poor quality images
  • Data that appears within paragraphs
  • Tables with overlapping columns, subtotal rows, etc.

These types of documents can still be captured with OCR but they will usually require an experienced technician to manually configure the template.

For natural language data like legal documents, a new artificial intelligence technology called NLP (Natural Language Processing) is available. These work by attempting to “understand” the language used in documents to interpret the location of data points based on meaning. ABBYY FlexiCapture also supports NLP-based training for these types of documents.

How to use Zone OCR when the data can be in different locations?

Modern Forms Processing software can use rules-based templates for locating data on documents based on label keywords, data types, regular expression pattern matching and other methods.

The most common example in business is an Invoice. Businesses receive invoices from 1000s of different vendors, each with important information like the Invoice Number, Due Date and Total needed to process the document, but each vendor invoice is formatted a little differently than the others.

Software like ABBYY FlexiCapture will look for keywords like “Invoice Number” or variations like “Inv #” and “Invoice No.” to locate the invoice number value on each invoice.

These applications are also able to capture complex table data and output to formats like Excel or a SQL Database, especially when it doesn’t line up into regular columns.

In recent years, artificial intelligence based training has made it possible to simply point and click on the location of data on documents as you process them and generate these templates automatically, dramatically reducing the need for ongoing expert help these systems require.

Does ReadIRIS, FineReader or OmniPage support Zone OCR?

The “Pro” versions of most Desktop OCR applications support the creation of zone templates that can be used to OCR specific regions on batches of documents.

Most OCR applications have “Lite” versions that don’t have the ability to manually create zones so it’s important to get the correct version.

With these applications it is often not possible to output this data as “fields” in a structured data file like CSV, Excel or XML. What you typically get a text file for each document with a line of text for each zone. The zones are designed more for excluding regions you don’t want or manually overriding the detection of text, tables and images in the document.

If you need to capture specific data in multiple documents and output them to structured data files or a SQL database, Batch OCR Applications are the best option for this.

If you need to capture data formatted in tables and output to CSV or Excel, desktop OCR applications do this quite well as long as the tables have a regular format with well-defined columns.

To capture handprint, irregular tables, large numbers of data points, or data that doesn’t always appear in the same place on every page, Forms Processing software is what you need.

Knowledge Base

The SimpleOCR Knowledge Base contains frequently asked questions and answers, technical guides and general information on a broad range of optical character recognition, handprint recognition, data capture, PDF OCR, AP invoice scanning and zone OCR applications.

Contact Us for FREE Consultation on Your OCR Project

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud delivers ABBYY’s advanced data capture platform capabilities via REST API and web interfaces. ABBYY FlexiCapture Cloud customers can rapidly configure and deliver their Content IQ solution, taking advantage of our cloud services to automate and accelerate their document-driven processes. The advanced machine learning and AI in the platform improve classification and data extraction results, enabling core processes to support better, smarter, faster decisions.

FlexiCapture Cloud enables organizations to accelerate digital transformation by complementing their automation systems with new and advanced cognitive capabilities that liberate the intelligence locked in their documents.

ABBYY FlexiCapture On-Premise

ABBYY FlexiCapture On-Premise – Distributed – Perpetual License PPY 50K Pages

ABBYY FlexiCapture is a powerful data capture and document processing solution from a world-leading technology vendor. It is designed to transform streams of documents of any structure and complexity into business-ready data. And its award-winning recognition technologies, automatic document classification, plus a highly scalable and customizable architecture, mean that it can help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs.

OCR to Database

Data is Everything. It does not matter in what field your company works, after all everything will be distilled into digits of data and accumulated in Database to be processed, stored, repurposed and reassembled again, again and again.  All organizations have database, that acts as a repository for all of their information. And you may survive with manual data entry, or using spreadsheets or just folders with documents for some time, but eventually just mare amount of Data will become overwhelming.

Luckily there are plenty of solutions for your Database. You can choose between SQL (MySQL, Access, Postgres, …) or NoSQL (Mongo, AWS, …) solutions for storing and processing Data, but there will be always an issue of how raw unprocessed digits get from images or texts into more structured form of your Database. Identifying and transferring all of this data can be a bit of a task. Misreading data or mismatching of data to fields could easily ruin your data processing system. Thus, precision of data character recognition becomes essential.

One of the solutions is to keep these processes of scanning and data transferring separate. You can use one software for character recognition and transferring data from image to PDF or text document. And then to use PDF (or text) to database converters to extract that data into your database format. The very obvious disadvantage of this approach is that it adds the whole extra step into your data processing. You will start accumulate additional errors, will add time for setting up additional conversion, will add time to data processing and will add time for inevitable error identifying and bug fixing. It may work for smaller companies, but on larger enterprise level it becomes cost prohibitive.

Another solution is OCR to Database direct approach. […]

SimpleIndex Barcode Suite

Simple Software SimpleIndex Product Suites offer you a better deal on bundles of essential products.

SimpleIndex Barcode Suite combines best Simple Software products to create a complete Barcode OCR solution. It includes:

  • SimpleIndex Barcode Server  license with built in Accusoft barcode engine and server functionality.
  • SimpleSend solution enables automated sending of document files via secure FTP or email. SimpleSend enhances the functionality of SimpleIndex in several ways as well as functioning as a standalone application.
  • SimpleExport license is designed to convert any delimited text file into any XML or formatted text file format using XSLT. It automates the process of applying XSLTs, especially for document imaging applications where the data has matching files that must be moved or renamed along with the data.
  • 5 licenses of SimpleCoversheet which is designed to work with data sources like SQL databases, spreadsheets and text files to dynamically build lists of barcodes to print. This is especially useful in document scanning applications where barcodes are used to identify and file documents automatically.

ABBYY FineReader Server On-Premise

ABBYY FineReader Server On-Premise

Innovative server-based OCR software for performing centralized enterprise-wide OCR processing. Allows anyone on the network to submit files for OCR. Complex XML job specifications can be submitted to control output. Support available for Arabic and Asian languages.

 

Available in CPU, Total Page Count and Pages Per Year licensing models.

 

SimpleIndex OCR Server 1M PPY

SimpleIndex  OCR Server 1 million pages per year – ABBYY FineReader OCR Server

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Professional

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Pro version Includes:

SimpleIndex Standard,

ISIS scanning,

FineReader OCR

Accusoft Barcode Upgrades

SimpleIndex Standard

Document capture solution with a one-click interface that automates your scanning and document filing by creating easy-to-find electronic content, saving you time and money.  It’s highly customizable to meet even the most detailed needs, with top quality technicians to support your requirements.

SimpleIndex Standard version
Includes:

basic text and barcode recognition,

TWAIN scanning

SimpleSoftware

SimpleIndex can bring speed and efficiency to your scanning or doc filing no matter the process. Even if all you are doing is hand keying a few basic details about a document, breaking those details into individual indexes and adding tools like drop down choice lists, automatic orientation, and blank page deletion ensure a smoother, more consistent process.

Automation

Here’s where things start to get interesting. From basic tasks like splitting individual documents within at stack of pages by spotting a blank page, a specific mark, or a barcode separator to capturing index data directly from the page or looking up additional details about a document in a database, SimpleIndex has a host of powerful tools to tame your piles of paper or drives full of digital files. Let’s look at a few.

OCR

Optical Character Recognition is the ability to take a scan, which is merely a picture of a page, and turn it into words that the computer can understand and use to index your files. SimpleIndex leverages the power of ABBYY FineReader, recognized as one of the best OCR engines on the market, to accurately capture names, dates, important numbers, document types, and other details about your file. Some products have you set a box and capture whatever information happens to fall in that zone. SimpleIndex takes it further with Dynamic Zone OCR to enable you to set an oversized zone that allows for shifting of the pages between scans, but still captures just the date you need by matching against templates, lists, or even Regular Expressions (RegEx). You can also skip the zones entirely and use the full text of a page to find matches for your index data.

Barcodes

Forms Processing

What is ICR, Survey & Forms Processing?

ICR stands for Intelligent Character Recognition and is the technology that allows software to interpret hand printed text on scanned images.

Data Capture Forms OCRForms Processing Software uses ICR technology to automate data entry tasks involving hand-filled surveys, applications and forms. It provides interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Forms Processing also includes OCR (Optical Character Recognition) technology to recognize machine printed text, and OMR (Optical Mark Recognition) for check boxes and multiple choice bubbles.

It is also possible to use these applications to automate data collection from PDF forms, Word documents, Excel spreadsheets, and other formats used to fill out forms electronically. Many include the ability to publish forms as paper, fillable PDF and web pages simultaneously to distribute and collect data from multiple sources into one dataset.

Who can benefit from forms processing software?

Any organization that collects data on paper-based forms, surveys or applications on a regular basis can get a very high return on investment by automating the data entry with forms processing software.

You do need to have a significant number of forms to justify the expense– at least a hundred forms per month or more depending on how much data is being captured. If the data entry task can be done in under 100 man-hours then it is not a good candidate for automation with ICR software.

Organizations that have many separate departments that collect data on forms can share the budget for forms processing software by re-using it for other projects. Your current project may not be big enough to justify the expense, but when combined with one or two others it would be.

How much do […]

Document Scanning

One Source, Many Solutions

There are many document scanning solutions to choose from. ScanStore offers many of the top document imaging solutions under one virtual roof. ScanStore‘s CDIA+ consultants can work with you to explain the strengths and weaknesses of each option and even provide a demo of the products using samples that you provide.

You’ll find flexibility with each of these products allowing a one-person shop to jump right in, or scale up to enterprise or service bureau proportions. If you need to throw some data capture into the document imaging mix, ScanStore also carries OCR, forms processing and document management tools.

Information and Advice

Take a look at the Scanning Solutions Comparison page to find in-depth information on the features of the available offerings and for more insight in finding the best fit.

And be sure not to miss the detailed comparison of the favorite Batch Scanning solutions in the exclusive Document Scanning Software Review.

What’s Right for You

You want a paperless office and document scanning is part of the path to get you there. Simply buying a scanner and feeding paper into it isn’t going to save you money. Automation of the scanning process is what holds costs down and drives up your Return on Investment.

For example, if an OCR automation costs $3,000 to implement, but by doing so you save a $15/hr employee 10 hours per week of data entry, the feature has paid for itself in 20 weeks.

So how do we automate the data capture? Here are a few possibilities:

  • Full-Page OCR turns a scan into a full-text document you can search

  • Barcodes on each document contain key data like a customer name or invoice number

  • A single field […]

Title

Go to Top