Data capture from scanned images, PDF documents, Word, Excel and other Office documents. Applications that support any size project from desktop OCR to enterprise forms processing. Data capture to CSV, Excel, SQL Databases, XML, JSON and any other format. Universal application integration with Robotic Processing Automation (RPA).

Cloud OCR vs Sunshine OCR


VS.

Cloud OCRWith the development of Cloud Computing, more and more OCR solutions started to move processing to the cloud. There are several major Cloud OCR solutions like AWS Textract (from Amazon), Azure AI Vision (from Microsoft), Google Cloud Vision AI, as well as more specialized solutions like ABBYY Vantage.

These days it is much harder to find an OCR data capture solution that is not fully or partly cloud-based.

Sunshine OCR, On Premise OCR,Sunshine Software or Sunshine OCR refers to on-premise Optical Character Recognition software that requires no internet connection to operate. Since there is no Cloud involved, we are calling it Sunshine Software to shine a light on the advantages of avoiding the Cloud.

We call it “Sunshine” because it’s catchy and makes us smile. But also because there is no good marketing term for this type of software. Before the “Cloud” became popular, it was known as simply as “Software.” After the development of Cloud Computing, it has been referred to as On-Prem, On-Premise, On-Site, Offline, Local, Native, Self-Hosted, In-House and other terms, but none of them highlight the benefits it has over the Cloud like Sunshine Software.

Cloud OCR or Sunshine OCR?

Cloud OCRIn the ever-evolving landscape of technology, businesses are faced with critical decisions regarding the deployment of software. Two prominent models, cloud-based and sunshine-based (on-premise) software, offer distinct approaches to meeting organizational needs. Understanding the differences between these models is essential for making informed decisions aligned with business goals and requirements.

Key Characteristics

Cloud-Based OCR Software:

Cloud OCR software operates on remote servers accessible over the internet. This model offers unparalleled scalability, allowing businesses to […]

Aviation and Marine Fuel Invoices

Aviation and Marine fuel invoice processing has a number of data capture and normalization requirements that make them very hard to automate without specialized knowledge and customized business rules.

Our team has implemented aviation, marine, and land fuel transportation invoice automation projects, with some of the top global energy services companies. We have extensive knowledge of these types of invoices and the way that they need to be processed to ensure that the myriad of taxes and fees are captured and applied to the correct fueling.

What Makes Aviation and Marine Invoice Processing So Complex?

  • Multiple taxes from multiple jurisdictions apply to each transaction. Excise tax, Environmental tax, LUST tax, VAT, etc.
  • Additional fees like Airport, Storage, Intoplane, Flowage, Fuel Surcharge, etc.
  • Different vendors use different descriptions for each tax and fee.
  • Invoices with multiple itemized fuel transactions but a single summary of taxes and fees that must be applied to each fueling proportionally.
  • Sometimes the same data appears in the header, and other times in the line items, sub-tables, or footers.
  • Aviation and Marine are international businesses, so handling multiple languages and currencies, exchange rate calculation, and currency conversions must be performed.
  • Since the price of fuel changes constantly, slight changes must be accommodated within acceptable tolerance.

Achieving Total Automation

Other invoice solutions capture data only as it appears on the invoice. While some normalization or mapping to known item IDs can be done, it is rare that the data can be imported and matched to a fuel order without additional manual steps. Unless the data being imported matches the expected data line by line, someone must manually reconcile the differences.

When orders are placed for a single uplift, but the vendor sends a daily or weekly […]

Digitech Systems OCR

Digitech Systems creates an award-winning digitization and content management software and cloud services that deliver Any Document, Anywhere, Anytime®, organizations of all sizes now securely and effectively extract, manage and automate their business information.

The PaperVision and ImageSilo brands are used by thousands of businesses worldwide from global conglomerates to Main Street to help teams pave the path to more meaningful work by transforming how they handle digital files, processes, documents, and more.

Starting with PaperFlow digitization software, subsequent years brought on-premise information management (PaperVision Enterprise, 1998), the world’s first cloud Enterprise Content Management system (ImageSilo, 1999), business process automation (PaperVision Enterprise WorkFlow, 2002), intelligent data capture (PaperVision Capture, 2009), and AI-enabled forms processing (PaperVision Forms Magic™, 2016). The PaperVision®.com cloud-based information management system was introduced in 2020. These products provide unprecedented ease-of-use and architectural flexibility, while balancing fully featured products with a sensible price/performance ratio and legendary customer service.

PaperVision Enterprise performs dependable electronic document management to automate office environments, conserve paper, time, money, and provide peace of mind. Retrieval solutions provide enterprise scalability and functionality with advanced features that enhance your efficiency and protect corporate data. Electronic information is retrieved instantly with our user-friendly graphical interface that displays a complete overview of all your available projects. View, manipulate, print, fax, export, and e-mail documents directly from your PC. On-premise installation or cloud hosted services available.

PaperVision Direct Scan, import, index, and organize paper documents using your existing scanners and multi-function devices (MFD) to create convenient digital files and securely upload them to the cloud. Start scanning documents right at your desk! Turn any vulnerable paper document into a useful digital file that can be securely managed in your PaperVision.com cloud service.

PaperVision Capture Forms Magic adds handwriting […]

How to enable email notification of PaperVision Capture errors

A useful feature of PaperVision Capture is the ability to receive an email notification from a PaperVision Capture service when the service encounters an error. By default, these errors are written only to the Windows event log, but with email notifications enabled, users can also be directly notified of problems in their PaperVision Capture environment.

Enabling email notifications requires a few modifications to the logging settings for each of the PaperVision Capture services: Process Initiator, Process Worker, Gateway Server, and Data Transfer Agent. Also, an SMTP server is required for email notifications.

Follow these steps to enable email notifications for PaperVision Capture services:

  1. Open the configuration file for a PaperVision Capture service in Notepad. These configuration files are usually located in C:\Program Files (x86)\Digitech Systems\PaperVision Capture. The following is a list of applicable configuration files:
    • DSI.PVECommon.PVProcInit.exe.config
    • DSI.PVECommon.PVProcWork.exe.config
    • DSI.Gateway.GatewayServer.exe.config
    • DSI.DataTransferAgent.Service.exe.Config
  2. Find each of the “listener” sections (in each configuration file, there is a listener section for each category of error) in the configuration file. Here is an example:
    <add name="Application" switchValue="Error"><listeners>
    <!--Commented listeners will NOT log and uncommented listeners will log.-->
    <add name="Event Log Destination"/>
    <!--<add name="Flat File Destination" />-->
    <!--<add name="Email TraceListener" />-->
    </listeners> </add>
  3. To enable email notifications, in all of the listener sections, change:
    <!--<add name=”Email TraceListener” />-->
    

    To:

    <add name=”Email TraceListener” />
  4. Once email notifications are enabled, the settings for these notifications must be modified. Find the following line:
    <add name="Email TraceListener" smtpServer="127.0.0.1" smtpPort="25" toAddress="[email protected]" fromAddress="[email protected]" subjectLineStarter="PaperVision Process Worker Event:" subjectLineEnder="" formatter="Text Formatter" listenerDataType="Microsoft.Practices.EnterpriseLibrary.Logging.Configuration.EmailTraceListenerData, Microsoft.Practices.EnterpriseLibrary.Logging, Version=2.0.0.0, Culture=neutral, PublicKeyToken=469958bca2e00646" traceOutputOptions="None" type="Microsoft.Practices.EnterpriseLibrary.Logging.TraceListeners.EmailTraceListener, Microsoft.Practices.EnterpriseLibrary.Logging, Version=2.0.0.0, Culture=neutral, PublicKeyToken=469958bca2e00646"/>
  5. Within this section, find smtpServer=”127.0.0.1” and change the IP address to the IP address of your SMTP server
  6. Find smtpPort=”25” and change the port to the SMTP port being used by the SMTP server
  7. Find toAddress=”[email protected] and change the email address to any email address to which these notifications must be sent; multiple email addresses can be specified by separating them […]

Grooper Document Processing

Grooper was built from the ground up by BIS, a company with 35 years of continuous experience developing and delivering new technology. Grooper is an intelligent document processing and digital data integration solution that empowers organizations to extract meaningful information from paper/electronic documents and other forms of unstructured data.

The platform combines patented and sophisticated image processing, capture technology, machine learning, natural language processing, and optical character recognition to enrich and embed human comprehension into data. By tackling tough challenges that other systems cannot resolve, Grooper has become the foundation for many industry-first solutions in healthcare, financial services, oil and gas, education, and government.

  • Single platform
  • Patented OCR
  • Image processing
  • Machine learning
  • Natural language processing
  • Zero code
  • Zero templates
  • Open architecture

SimpleIndex Cloud OCR

SimpleIndex Cloud OCR adds Amazon AWS Textract OCR to any SimpleIndex workstation or server license.

Textract capabilities include the most accurate OCR and handprint recognition available, automatic form field detection, accounts payable invoice and receipt processing.

Amazon Textract is only available as an API that requires custom programming to make it work. SimpleIndex turns it into a complete document and data capture application designed for easy batch processing on a workstation or server.

Requires an AWS account. Standard Textract transaction fees will apply.

PDF Processing with FineReader and FineReader Server

How to create a PDF from Microsoft® Word, Excel, or PowerPoint

 

How to convert emails to PDF

 

How to Split a PDF

Create new PDF documents or separate PDF documents combined in one easily with FineReader PDF 15.

Learn how to split PDFs and extract pages easily.

 

 

How to create and edit interactive PDF forms

Watch this video and see how to edit and create interactive PDF forms quickly and easily.

Form Editor tool in FineReader PDF 15 allows creating and editing fillable PDF forms with text and date fields, dropdown lists, list boxes, checkmarks, radio buttons, signature fileds and action buttons. Collect information and create effective document templates with ease!

 

How to extract text from scanned PDFs

 

 

How to extract tables

 

 

How can I verify if the digital signature is valid?

If you open a document with a valid digital signature in FineReader, you will see a green notification Valid on the left panel of ABBYY FineReader PDF 15:
 mceclip0.png

Recognizing a document with existing text layer in FineReader PDF 15

  1. Open FineReader PDF 15;
  2. Go to Tools > Options > OCR;
  3. In the PDF recognition mode select Use OCR option:
  4.  Click OK;
  5.  Recognize your document again.

 

 

How to convert a document into an accessible PDF/UA

Make your mixed documents—PDF, scanned, photographed, or papers— digital and accessible.

In this […]

Reading Handprint, Checkmarks, and Forms with FlexiCapture and Vantage

ICR – Intelligent Character Recognition

Intelligent Character Recognition

  • Intelligent Character Recognition (ICR) is an extended technology of the optical character recognition (OCR ). While the OCR technology is designed to extract machine-printed characters, the ICR technology retreives information provided as hand-printed characters
  • The ICR technology can extract hand-printed characters that are separated and written as individualcharacters in areas/zones – these areas/zones needs to be specified as fixed fields of a machine readable forms. Alternativelly, they need to be automatically detected.

Example of a form containing hand-printed characters:

icr-form-illu.png

Important note: ICR is not able to extract texts in “cursive handwriting” as in this example:

old-handwriting-illu.png

  • In most cases, the ICR technology is linked to Field Level / Zonal Recognition and forms processing.
  • To enhance the ICR recognition accuracy, it is recommended to use meta data, for example regular expressions, dictionaries or database lookups.

ICR in ABBYY SDKs

The following ABBYY SDKs and products support ICR

  • FineReader Engine
    Since the version 12, Release 3, ICR is as well included in the Linux version. Since the Release 4 of the version 12, it is as well included in the Mac version of FineReader Engine (in lower versions, the ICR technology was only supported in the Windows version.
  • FlexiCapture SDK – this SDK is designed for forms processing and data extraction, ICR and template matching for fixed forms are part of the default feature set. In addition, ABBYY offers this technology as a product in form of the FlexiCapture platform.
  • Cloud OCR SDK – the ABBYY OCR service, allows reading zones that contain hand-printed, separated characters. This online OCR service […]

Using FlexiLayout Studio to Design Data Capture Templates

FlexiLayout: How to capture a table using Repeating Group if table header is on each page

In some cases, we might have a table that we are not able to capture correctly using a traditional method – Table element. In such cases, we usually use Repeating Group element.

But what if we come across a multi-page document that has a table header on each page?

mceclip0.png

We can use two following methods to capture such a table using the Repeating Groups.

Using Absolute search area constraints

To limit the search area to the table area so that it doesn’t capture unnecessary text outside of the table, we can use Absolute search area constraints in the Search Constraints tab.

You can measure the area with the Measure Rectangle tool.

mceclip0.png

Using nested Repeating groups

Sometimes it might be not suitable to use the Absolute search area constraints method because other tables using this layout might have different positions and lengths of elements, thus making it not convenient to use the method, because you will have to re-measure the area every single time.

In such a case, you can use the nested Repeating group method.

  1. Create the first, “main” Repeating group that will include the Table header and footer. mceclip1.png
  2. Next, create the nested RG in the first RG. The relations are as follows: mceclip2.png
  3. These are the main steps, other elements in the RG don’t need any specific settings and should be designed according to the needed results.

Additional information

FlexiLayout: Capturing a table using Repeating Group

 

How to reliably capture elements in FlexiLayout Studio if the image resolution can vary

When the image resolution varies, then the search area of elements based on absolute offsets can miss […]

FlexiCapture and Vantage Natural Language Processing (NLP)

How to train NLP machine learning model

Today different industries face similar challenges as they seek to extract information from business documents, such as policies, e-mails and legal agreements – and most agree that is costly, time consuming and prone to errors with manual data entry.

In this video you will learn how to train NLP machine learning model in FlexiCapture to extract entities and text passages from Lease agreements.

Converting unstructured documents into structured data automatically makes this information available to your business applications while saving you time, money, and labor in the process.

 

Adding a field which is captured by flexilayout to a NLP-trained Document Definition

You can add the new flexible layout as additional layout to the existing one.
To do that, please open the Document Definition Editor, go to the Section’s properties and load the new layout as additional FlexiLayout.

Using ABBYY Vantage Document Skills

Processing Your First Documents with Vantage

Learn how easy it is to get started with Vantage – upload your documents and Vantage will take care of the rest.

 

How to Create and Train a Vantage Document Skill

Learn how to use the Vantage Skill Designer to create and train a new Document Skill with just a few sample documents.

 

How to Create and Train a Classification Skill in ABBYY Vantage

Learn how to use the Vantage Skill Designer to train a new Classification Skill. You need just a few samples of each document class.

 

 

How to Automate a Complete Workflow, by Creating a Vantage Process Skill

 

 

How to Edit a Document Skill

Learn how to adapt already existing skills to your specific documents and business requirements.

 

 

How to perform the first authentication in Vantage Swagger UI?

To get a first access token perform the initial authentification using the default client, one does not need to enter any passwords or client ID. The initial authentication is preconfigured. Just open a Swagger page (EU link or US link), click Authorize:

mceclip1.png

Select all scopes, and click Authorize again:

mceclip0.png

The password should be specified only for a custom client. A custom client can be created after the initial initialization.

References

EU Help: Getting a Tenant Identifier or US Help: Getting a Tenant Identifier

EU Help: Creating a Client or US Help: Creating a Client

Learn more at ABBYY […]

Using SharePoint in FineReader Server

Saving documents to the SharePoint in ABBYY Recognition Server

Note: In order to be able to communicate with the SharePoint Server, the Server Manager and the Remote Administration Console require Microsoft .NET Framework 4.5 to be installed.

To be able to save output documents to a SharePoint Server library, the ABBYY Recognition Server Server Manager service must be run under a user account that has read/write access to the SharePoint Server library. If during the installation, you chose to run the service under a Local System account, you should restart it under a user account.

To set up the publishing of documents to a SharePoint Server library:

  1. Run the Remote Administration Console under a user account that has read/write access to the SharePoint Server library.
  2. Create a new workflow or modify an existing one (see Creating a New Workflow). In the Output Format Settings dialogue select Save output file in SharePoint library.
  3. Enter the URL of the SharePoint Server site (e.g. http://myportal/mysite/) and click Connect. The Remote Administration Console will try to connect to the specified site and download the list of document libraries and folders from there. If the connection is successful, you will see the “Connected” message below the button, and the names of the document libraries will appear in the Select document library list.
  4. Select the document library from the list. Click the Settings…button to associate a document’s metadata fields with the corresponding columns of the selected document type in SharePoint.
  5. Select the folder in the document library using the Browse…button or leave the field empty to save documents in the root folder.
  6. Click OK in the Output Format Settings dialogue box.

If the Input folder has several subfolders containing image files, the output files will be saved in […]

Google Cloud Vision API

Vision API offers powerful pre-trained machine learning models through REST and RPC APIs. Assign labels to images and quickly classify them into millions of predefined categories. Detect objects and faces, read printed and handwritten text, and build valuable metadata into your image catalog.

Automatically extract handwriting, plain text or form data from any document using a huge machine learning model based on billions of sample documents.

Google Vision is a cloud OCR service that automatically detects and extracts text and data from scanned documents and PDF files. It goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

Google Vision API also lets you implement OCR in your RPA workflows. UiPath and other bots offer connectors that let you include Vision OCR into your RPA process.

Google Vision is not a “ready-to-use” product. It requires programing skills, experience with Google cloud services, and decent amount of coding to implement it into your systems, especially once you add user interfaces for scanning and data validation.

Simple Software developers have the necessary skills and experience to integrate Google Vision into your custom applications. Contact us or click the Request a Quote button to get a proposal for your custom application development project.

Remark Test Grading

Remark Test Grading is an easy-to-use solution to quickly grade online and paper tests, saving you time and money. Remark Test Grading Cloud allows busy instructors to quickly create and grade tests in the cloud so they can get more accomplished with less. With just a few clicks of the mouse, instructors can create an online test or a printable test answer sheet to be distributed to their students.

Remark Office OMR Software

Data collection and analysis software for surveys, tests and other plain paper forms. You create your own forms that are scanned with an image scanner or copier. Remark Office OMR product has been used to scan and process billions of forms. Remark gives you the tools you need to get your results quickly. Through years of customer feedback, we’ve carefully designed our products to be user-friendly while providing a rich feature set to satisfy the specific needs of individuals like you.

ABBYY Vantage

ABBYY Vantage leverages AI machine learning and a huge library of document “skills” to provide out-of-the-box data capture for all kinds of documents.

Vantage provides a simple way to implement new data capture processes without the need for programmers.

It takes the FlexiCapture platform, hosts it in the cloud, and dramatically simplifies the interface. The thousands of settings you can use with FlexiCapture to build templates are managed by the AI, giving you a simple point and click interface to create new document capture workflows.

The “Skills” library gives you pre-configured capture workflows for hundreds of the most common documents. Simply connect them to your import and export destinations and you are ready to go, saving you hours or even days of development time.

UiPath RPA OCR Solutions

OCR data capture automation workflow development using UiPath RPA.

While we specialize in OCR data capture solutions, our certified UiPath integrators can help you with any aspect of your robotic process automation initiative. From planning to rollout to user training and bot development, our RPA specialists will ensure best practices are implemented and ROI is realized.

Contact us to get a quote or an online demo of our UiPath OCR solutions.

ABBYY FlexiCapture SDK On-Premise

ABBYY FlexiCapture SDK enables software developers to quickly create applications that extract meaning from documents. FlexiCapture SDK is ideal for system integrators, developers, and service providers who want to integrate powerful data capture capabilities into their solutions. Through the use of ABBYY’s machine learning and AI, end customers are able to process more transactions, faster, and with fewer errors, improving customer service, reducing costs, and making smarter process decisions.

FlexiCapture SDK, as a delivery option of the FlexiCapture platform, provides developers with a powerful and flexible toolkit to smoothly integrate ABBYY’s industry-leading data capture technologies to empower their own products and services according to vertical market needs.

Licensing is based on number of developers for the base SDK, then annual page counts for the perpetual license. All of the functionality supported by ABBYY FlexiCapture is included in the license, with a few exceptions.

ABBYY Cloud OCR SDK

ABBYY® Cloud OCR SDK is a web-based document processing service that will enhance your enterprise software systems, SaaS platforms, or your mobile apps with the ability to convert documents and utilize textual information from scans, PDFs, document images, smartphone photos, or screenshots.

Combining ABBYY’s latest AI-based technologies for information extraction with the highly scalable processing power of the Microsoft® Azure® computing infrastructure, this secure and reliable ABBYY cloud service can be easily integrated into your application via a REST API—empowering it to precisely convert virtually any number of pages within the shortest amount of time.

Robotic Process Automation

Introducing Robotic Process Automation

RPA stands for Robotic Process Automation and it represents a new approach to business automation that helps minimize the technical hurdles required for implementing new workflows.

Robotic Process Automation of Data Entry

Traditional business process automations rely on application programming interfaces (APIs) to allow systems to exchange data. This approach has two main drawbacks:

  1. The application vendor must make those APIs available
  2. A programmer needs to write custom code to interface with them

If your software vendor does not provide an interface for consuming the data you need to automate, then you’re out of luck. And even if they do, the development costs can eliminate the ROI if the transaction volume isn’t large enough.

RPA tools avoid the API problem by interfacing directly with the application user interface just like a human would do. They use artificial intelligence and machine learning to “watch” the operator perform a task within the application then creates its own program (called a “bot”) to mimic it. This means that:

  1. Bots can do anything a human can do within the application
  2. Users can create a bot without writing code

Practically speaking, an experienced robotic process automation consultant with programming experience is required to roll out an RPA solution enterprise-wide, and most users will only be able to automate small, routine tasks without assistance. Business-critical, high-volume automations will still involve coding. But RPA dramatically reduces the implementation time and avoids the need to retrofit APIs for software applications that were not designed to support them.

Using RPA with OCR Data Capture

UiPath Robotic Process Automation RPA OCROCR Data Capture is one of the most common business processes to automate with RPA. Taking data stored in paper or electronic documents and […]

PaperVision Capture Forms Magic

PaperVision Capture Forms Magic adds handwriting recognition, forms processing, invoice processing or healthcare claims forms templates and business rules to their high-volume document scanning and data capture platform.

OCR Consulting Services

OCR Experts for Any Project

Our unique team of OCR experts are equipped to help out with OCR projects of any size or complexity. We have support specialists that can remotely configure desktop solutions in a matter of minutes and expert systems integrators with years of programming, database design, and robotic process automation experience.

Desktop OCR

Batch Document Scanning and OCRUse our online store to order desktop OCR applications and our staff will be happy to answer your setup questions via email or web chat.

Remote configuration and training services using GotoMeeting are available for a low hourly rate.

Let Us OCR That For You

Got a one-time conversion and don’t want to hassle with software? Upload your scanned document to us and we’ll send back the converted files. Optional verification service corrects recognition errors and layout issue for a low hourly rate.

Data processing for forms, reports, directories, and other documents is also available with output to CSV, Excel, XML, JSON, SQL, etc.

Contact us and if possible provide a sample, total pages, desired output and whether you want us to correct the results after OCR and we’ll reply back with a quote right away. Prices start at $50 for up to 1,000 pages.

Batch Scanning & OCR Servers

Data Capture Forms OCRAutomate document scanning and digital document archival processes using zone OCR, barcode recognition, database integration and other technologies.

Small business systems and single document workflows can be setup remotely via GotoMeeting, usually in just a few hours. Chat now if we’re online or leave a message to schedule a consultation.

Data Capture and Forms Processing

Advanced data extraction solutions that can turn the most complex documents into structured data ready […]

OCR Data Capture

What is OCR Data Capture?

document OCR process automationOCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Enterprise data capture systems provide interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Who can benefit from OCR data capture software?

messy business information made easy with ocr data captureAny organization that collects data from paper documents, or electronic files like PDF and Office documents, can get a very high return on investment by automating the data entry with OCR data capture software.

You do need to have a significant number of documents to […]

Why are the prices of OCR applications so different?

OCR software ranges in price from freeware all the way up to tens of thousands of dollars. What explains the difference between these applications? Here’s the breakdown:

  • OCR Freeware uses the SimpleOCR or Tesseract engines and provide limited scanning and output format capabilities. Recognition quality is generally poor except for the highest quality document images.
  • PDF OCR Converters provide good quality OCR engines like ABBYY, IRIS and OmniPage, but limit the output to searchable PDF files. These cost less than $100.
  • Standard OCR applications range from $100-$200 and provide full OCR capabilities including converting scans to Word, Excel, HTML and other editable formats.
  • Corporate OCR applications add advanced features like automated hotfolder processing, concurrent licensing and other features useful for business applications. Pricing for these is $200-$500.
  • OCR Servers provide scalable, enterprise OCR services for processing very high volumes of documents or providing OCR capabilities to users throughout the organization. Prices start around $1,500 and go up based on processing volume.
  • Enterprise Data Capture and Forms Processing applications are used to capture structured data from complex documents like healthcare claim forms and invoices that include things like tables, handwriting, checkboxes, and movable zones. These solutions can cost anywhere from around $1,000 to hundreds of thousands of dollars depending on the document volume and complexity of the project.

Using OCR to capture data from tables and reports

Data that repeats over and over again in a document can be OCR’d to Microsoft Excel, Google Sheets and other spreadsheet formats, or a SQL Database like Access, SQL Server, MySQL and Oracle.

Inexpensive Desktop OCR products like FineReader, ReadIRIS and OmniPage can automatically convert data from tables to Excel and other spreadsheets, as long as the columns are standard and don’t “overlap” such that different field values appear in the same column area, like when one row of each record represents one set of columns and a second row has additional column data.

Converted data will require some clean-up before it is usable in any database or software application, and it is difficult to convert large numbers of documents in batches this way. But it’s a good way to produce structured data from large single reports or small batches of similar report data.

For more complex tables, tables with similar data but different formats on different documents (like Invoices), tables with nested structure like header and detail rows, Enterprise Forms Processing software is required to turn these documents into structured data like XML, JSON or SQL database tables.

Title

Go to Top