Document Classification & Separation

In the wise words of The Offspring: “You gotta keep ’em separated!”

Automatic document separation is a common headache in document scanning and data capture applications.

When mixed batches of documents are received, it is not always easy to determine where one document ends and the next one begins.

The traditional solution for automatic document separation is to insert barcode or patch code separator pages between documents. Many scanners and scanning applications have built-in functions to read these on-the-fly and create a new file automatically each time one is found. However this approach only works with paper documents, where electronic PDF files have become far more common. And it requires printing hundreds or thousands of sheets and manually inserting them prior to scanning.

Automatic Document Separation

Modern Enterprise Data Capture solutions have the ability to define unique page elements that help identify the first and last page of any document. These work well when there is some easily identified text or form element that can be used for automatic document separation, but it doesn’t work when documents are unstructured or have a wide variety of formats, such as invoices.

For documents like invoices, Simple Software has developed intelligent separation scripts that compare extracted data from each page to determine when a new invoice starts. Multi-page invoices and attachments like BOLs and other paperwork are split automatically from large PDF files that vendors often send containing many invoice transactions.

When there is no way to determine the start of each new document automatically, Simple Software has some unique ways to automate document separation.

SimpleIndex can use OMR to separate scanned files based on a black mark placed on a corner of the first page of each file during preparation. This is much faster and better for the environment than inserting printed barcode sheets.

SimpleView lets you control-click to highlight the first page of each document in a thumbnail view, then split the PDF or TIFF file into individual documents automatically based on this selection. This is significantly faster than the drag-and-drop method used by most PDF editors.

How to configure a Batch Splitting step to split on a blank value

In PaperVision Capture a batch splitting step can be configured to meet one or more of many conditions. In some cases it may be desirable to split a batch based off a blank value within an index field. This can be achieved by using a String Comparison or Regular Expression.

The following steps should be used to configure batch splitting using a blank value. Note: These steps assume you will be splitting the batch based on an index field called “ExampleIndexField”. The index field should already exist in the job.

To split the batch on a blank value using the String Comparison type:

  1. Setup the Target Job Configuration.
  2. Add a batch split step.
  3. Add a New Condition.
    • The condition source: Capture Index
    • Choose Capture Index: “ExampleIndexField”
    • Choose Comparison Type: String Comparison
    • Leave the drop down on the equal sign “=” and leave the text box, blank.
    • Click Finish
  4. The condition should read (CI.ExampleIndexField = “”)

 

To split the batch on a blank value using the Regular Expression Comparison type:

  1. Setup the Target Job Configuration.
  2. Add a batch split step.
  3. Add a New Condition
    • The condition source: Capture Index
    • Choose Capture Index: “ExampleIndexField”
    • Choose Comparison Type: Regular Expression
    • Input the Regular Expression which represents any blank space characters: ^\s*$
    • Click Finish
  4. The condition should read (CI.ExampleIndexField RegEx.Match(“^\s*$”)

Reading Barcodes with Digitech PaperFlow and PaperVision Capture

Does processing barcodes “on-the-fly” make any difference in speed or recognition?

On-the-fly processing is actually a preferable way of reading barcodes since it does not noticeably decrease scan speed. The recognition will be the same whether the barcode is processed on the fly or as a post-process.

 

Using FlexiLayout Studio to Design Data Capture Templates

FlexiLayout: How to capture a table using Repeating Group if table header is on each page

In some cases, we might have a table that we are not able to capture correctly using a traditional method – Table element. In such cases, we usually use Repeating Group element.

But what if we come across a multi-page document that has a table header on each page?

mceclip0.png

We can use two following methods to capture such a table using the Repeating Groups.

Using Absolute search area constraints

To limit the search area to the table area so that it doesn’t capture unnecessary text outside of the table, we can use Absolute search area constraints in the Search Constraints tab.

You can measure the area with the Measure Rectangle tool.

mceclip0.png

Using nested Repeating groups

Sometimes it might be not suitable to use the Absolute search area constraints method because other tables using this layout might have different positions and lengths of elements, thus making it not convenient to use the method, because you will have to re-measure the area every single time.

In such a case, you can use the nested Repeating group method.

  1. Create the first, “main” Repeating group that will include the Table header and footer. mceclip1.png
  2. Next, create the nested RG in the first RG. The relations are as follows: mceclip2.png
  3. These are the main steps, other elements in the RG don’t need any specific settings and should be designed according to the needed results.

Additional information

FlexiLayout: Capturing a table using Repeating Group

 

How to reliably capture elements in FlexiLayout Studio if the image resolution can vary

When the image resolution varies, then the search area of elements based on absolute offsets can miss […]

Using ABBYY Vantage Document Skills

Processing Your First Documents with Vantage

Learn how easy it is to get started with Vantage – upload your documents and Vantage will take care of the rest.

 

How to Create and Train a Vantage Document Skill

Learn how to use the Vantage Skill Designer to create and train a new Document Skill with just a few sample documents.

 

How to Create and Train a Classification Skill in ABBYY Vantage

Learn how to use the Vantage Skill Designer to train a new Classification Skill. You need just a few samples of each document class.

 

 

How to Automate a Complete Workflow, by Creating a Vantage Process Skill

 

 

How to Edit a Document Skill

Learn how to adapt already existing skills to your specific documents and business requirements.

 

 

How to perform the first authentication in Vantage Swagger UI?

To get a first access token perform the initial authentification using the default client, one does not need to enter any passwords or client ID. The initial authentication is preconfigured. Just open a Swagger page (EU link or US link), click Authorize:

mceclip1.png

Select all scopes, and click Authorize again:

mceclip0.png

The password should be specified only for a custom client. A custom client can be created after the initial initialization.

References

EU Help: Getting a Tenant Identifier or US Help: Getting a Tenant Identifier

EU Help: Creating a Client or US Help: Creating a Client

Learn more at ABBYY […]

Barcode Recognition in ABBYY FineReader & FlexiCapture

Recognition of Barcodes in ABBYY technologies

ABBYY technology and products can read different barcode types.
The Document Analysis algorithms are able to locate and identify different barcodes on a document page, but of course it is also possible to “draw” a barcode block also via API.
Once the barcode region is defined/detected, it can be recognised. The API provides access to:
  • the coordinates
  • the characters
  • character confidence information
  • start/stop symbols of different barcode types,
    for barcodes of type Code 39 the start/stop symbol is the asterisk “*”
  • The barcode value can then be used for file naming.
A very common scenario is document separation based on barcodes.
    • This feature is implemented in FlexiCapture projects
    • With FineReader Engine, the developers can “cut” the page stream with custom code
    • Separation in FineReader Server
    • Separation in the ABBYY Scan Station (FineReader Server & FlexiCapture)

Tips for working with barcodes

Barcode recognition quality depends on:

  • the barcode print quality
  • settings used in the document scanning process
  • Placement of the barcode when it is manually added

In order for the barcodes to be recognized well, follow these recommendations:

  • A barcode must be separated from other text by a fairly wide white gap.
  • Barcode size and the width of its separate bars or dots must meet the following requirements:
    • The optimal barcode height is more than 10 millimetres. The size of a barcode should be less than A4 size.
    • Barcode height must be higher than the double height of a text line
    • For not-square barcodes, their […]

Document Processing via Email in FineReader Server

In this video, learn how to configure a workflow for document input and processing via e-mail on FineReader Server.

See how in a few simple steps you can configure this workflow. You can even edit the e-mail subject and message. Watch how to setup usage scenario: Centralized document conversion service in this video.

From the document input to the document output, including the document processing, FineReader Server is designed to simplify, optimize and fasten your worflows. Scalable and easy to configure, FineReader Server can adapt to all your needs.

 

How to convert several emails from MS Outlook into PDF?

In order to convert several emails into a PDF file, you may use the virtual printer PDF-XChange 5.0 for FineReader.

Follow the steps below:

  1. Select the needed emails in MS Outlook.
  2. Press File>Print.

    File_Outlook.png

  3. Select ​PDF-XChange 5.0 for FineReader as a printer and press Print.

    br-lazy"

  4. Save the PDF file.

 

How to set up the import from the Gmail mailbox using the IMAP Image Import Profile?

  1. In your Gmail, create the folder (mailbox) that you want to import from.
  2. In your Gmail, create the folders (mailboxes) for the Exceptions and Processed emails.
  3. Enable the IMAP protocol in Gmail settings.br-lazy"
  4. Turn on the Less secure app access option in the Security section of Google Account Settings.br-lazy"
    br-lazy"
  5. Create a new Image Import Profile (open the project in the Project Setup Station > Project > Image Import Profiles > New). Choose Hot Folder: IMAP Server.
  6. Specify the address of the IMAP server: imap.gmail.com.
  7. Click settings and specify your Gmail login and password. Choose Type of encrypted connection: SSL.

    mceclip0.png

  8. Click Browse and select […]

ABBYY Vantage

ABBYY Vantage leverages AI machine learning and a huge library of document “skills” to provide out-of-the-box data capture for all kinds of documents.

Vantage provides a simple way to implement new data capture processes without the need for programmers.

It takes the FlexiCapture platform, hosts it in the cloud, and dramatically simplifies the interface. The thousands of settings you can use with FlexiCapture to build templates are managed by the AI, giving you a simple point and click interface to create new document capture workflows.

The “Skills” library gives you pre-configured capture workflows for hundreds of the most common documents. Simply connect them to your import and export destinations and you are ready to go, saving you hours or even days of development time.

PaperVision Capture Forms Magic

PaperVision Capture Forms Magic adds handwriting recognition, forms processing, invoice processing or healthcare claims forms templates and business rules to their high-volume document scanning and data capture platform.

OCR Consulting Services

OCR Experts for Any Project

Our unique team of OCR experts are equipped to help out with OCR projects of any size or complexity. We have support specialists that can remotely configure desktop solutions in a matter of minutes and expert systems integrators with years of programming, database design, and robotic process automation experience.

Desktop OCR

Batch Document Scanning and OCRUse our online store to order desktop OCR applications and our staff will be happy to answer your setup questions via email or web chat.

Remote configuration and training services using GotoMeeting are available for a low hourly rate.

Batch Scanning & OCR Servers

Data Capture Forms OCRAutomate document scanning and digital document archival processes using zone OCR, barcode recognition, database integration and other technologies.

Small business systems and single document workflows can be setup remotely via GotoMeeting, usually in just a few hours. Chat now if we’re online or leave a message to schedule a consultation.

Data Capture and Forms Processing

Advanced data extraction solutions that can turn the most complex documents into structured data ready for use in business applications. Each member of our data capture consulting team has over 10 years experience designing and implementing advanced OCR solutions.

We are the most experienced system integrator in the US for our flagship data capture platform, ABBYY FlexiCapture. We saw its potential immediately when it was introduced and now over 15 years later it is the leading data capture solution and no team is more experienced than ours at implementing it. We are the ones that other ABBYY integrators call for their most complex implementations.

While we have designed capture solutions for all types of documents, we have particular expertise in the following areas:

    […]

OCR Data Capture

What is OCR Data Capture?

document OCR process automationOCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Enterprise data capture systems provide interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Who can benefit from OCR data capture software?

messy business information made easy with ocr data captureAny organization that collects data from paper documents, or electronic files like PDF and Office documents, can get a very high return on investment by automating the data entry with OCR data capture software.

You do need to have a significant number of documents to […]

Knowledge Base

The SimpleOCR Knowledge Base contains frequently asked questions and answers, technical guides and general information on a broad range of optical character recognition, handprint recognition, data capture, PDF OCR, AP invoice scanning and zone OCR applications.

Contact Us for FREE Consultation on Your OCR Project

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud delivers ABBYY’s advanced data capture platform capabilities via REST API and web interfaces. ABBYY FlexiCapture Cloud customers can rapidly configure and deliver their Content IQ solution, taking advantage of our cloud services to automate and accelerate their document-driven processes. The advanced machine learning and AI in the platform improve classification and data extraction results, enabling core processes to support better, smarter, faster decisions.

FlexiCapture Cloud enables organizations to accelerate digital transformation by complementing their automation systems with new and advanced cognitive capabilities that liberate the intelligence locked in their documents.

ABBYY FlexiCapture On-Premise

ABBYY FlexiCapture On-Premise – Distributed – Perpetual License PPY 50K Pages

ABBYY FlexiCapture is a powerful data capture and document processing solution from a world-leading technology vendor. It is designed to transform streams of documents of any structure and complexity into business-ready data. And its award-winning recognition technologies, automatic document classification, plus a highly scalable and customizable architecture, mean that it can help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs.

Invoice Processing

What is Invoice Processing?

Invoice Processing Software uses OCR data capture technology and page layout analysis to automatically identify the common data elements in an invoice, such as vendor, date, amount, invoice number, line item data, etc.. Invoice Processing applications are built using the same technology as other data extraction applications, but have been specifically configured to recognize Invoices since they are one of the most common documents that companies need to automate.

Who can benefit from Invoice Processing software?

Data Capture Forms OCRAny organization that receives a large number of vendor invoices on paper can benefit from invoice processing technology. The more data from each invoice that you are hand-keying into your accounting software the more benefit you can get from each page you automate.

Accounting firms and other companies that do outsourced accounts payable processing stand to gain the most return on investment from automation. It also has the benefit of on-shoring the data entry, providing additional security and piece of mind to your customers.

A robust OCR invoice processing solution becomes justifiable when you have over 1,000 invoices per month. When dealing with smaller volumes the potential return on investment does not justify investment in an enterprise solution. However, a simple document scanning solution to digitize and store scanned invoices can still provide many benefits.

How does it lower the cost of AP processing?

There are many benefits to OCR invoice systems when it comes to AP processing that give these systems a very high return on investment.

  • Automate data entry tasks
  • Eliminate data entry errors with validation rules and improved UI
  • Automate retrieval of invoices from vendor websites or email
  • Assign Profit Centers and GL Codes based on business rules
  • Match invoices against open POs
  • […]

SimpleView

Application for managing and viewing scanned documents, images and PDF files.

Unlike other freeware PDF viewers, SimpleView is designed to work with many files at once instead of one at a time. The free version also supports TWAIN scanning and the ability to move, rearrange and rotate pages.

Simple Software

SimpleIndex can bring speed and efficiency to your scanning or doc filing no matter the process. Even if all you are doing is hand keying a few basic details about a document, breaking those details into individual indexes and adding tools like drop down choice lists, automatic orientation, and blank page deletion ensure a smoother, more consistent process.

Automation

Here’s where things start to get interesting. From basic tasks like splitting individual documents within at stack of pages by spotting a blank page, a specific mark, or a barcode separator to capturing index data directly from the page or looking up additional details about a document in a database, SimpleIndex has a host of powerful tools to tame your piles of paper or drives full of digital files. Let’s look at a few.

OCR

Optical Character Recognition is the ability to take a scan, which is merely a picture of a page, and turn it into words that the computer can understand and use to index your files. SimpleIndex leverages the power of ABBYY FineReader, recognized as one of the best OCR engines on the market, to accurately capture names, dates, important numbers, document types, and other details about your file. Some products have you set a box and capture whatever information happens to fall in that zone. SimpleIndex takes it further with Dynamic Zone OCR to enable you to set an oversized zone that allows for shifting of the pages between scans, but still captures just the date you need by matching against templates, lists, or even Regular Expressions (RegEx). You can also skip the zones entirely and use the full text of a page to find matches for your index data.

Barcodes

SimpleOCR | OCR Software Experts

Learn More Download Now

Document Scanners
& Scanner Parts

Accurate OCR starts with quality images. Efficient OCR starts with fast scanning. Find Document Scanners built for OCR at ScanStore.

Our Team of OCR experts is here to help! SimpleOCR is not just Freeware, we have every kind of OCR solution from PDF Converters to Enterprise Data Capture, OCR Servers and Handprint Recognition for Forms and Surveys. Live chat with an OCR specialist now or Contact Us for a consultation on your OCR project.

SimpleOCR is the popular freeware OCR Software with hundreds of thousands of users worldwide. SimpleOCR is also a royalty-free OCR SDK for developers to use in their custom applications.

SimpleIndex is OCR built for business, offering powerful batch scanning, OCR server, and data capture features with a simple user interface and affordable licensing.

If you like free stuff, freeware versions of our SimpleView Document Viewer (with Tesseract OCR), SimpleCoversheet Bar Code Printer, and SimpleExport CSV to XML Converter are also available.

If you have a scanner and want to avoid retyping your documents, SimpleOCR is the fast, free way to do it. The SimpleOCR freeware is 100% free and not limited in any way. Anyone can use SimpleOCR for free–home users, educational institutions, even corporate users.

If your documents have multi-column layouts, non-standard fonts, tables, poor quality or digital camera images, you will not have much success with applications based on free and open source engines like SimpleOCR and Tesseract. You will need a commercial OCR application to get an accurate read. Our OCR Guide compares desktop and server […]

Forms Processing

What is ICR, Survey & Forms Processing?

ICR stands for Intelligent Character Recognition and is the technology that allows software to interpret hand printed text on scanned images.

Data Capture Forms OCRForms Processing Software uses ICR technology to automate data entry tasks involving hand-filled surveys, applications and forms. It provides interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Forms Processing also includes OCR (Optical Character Recognition) technology to recognize machine printed text, and OMR (Optical Mark Recognition) for check boxes and multiple choice bubbles.

It is also possible to use these applications to automate data collection from PDF forms, Word documents, Excel spreadsheets, and other formats used to fill out forms electronically. Many include the ability to publish forms as paper, fillable PDF and web pages simultaneously to distribute and collect data from multiple sources into one dataset.

Who can benefit from forms processing software?

Any organization that collects data on paper-based forms, surveys or applications on a regular basis can get a very high return on investment by automating the data entry with forms processing software.

You do need to have a significant number of forms to justify the expense– at least a hundred forms per month or more depending on how much data is being captured. If the data entry task can be done in under 100 man-hours then it is not a good candidate for automation with ICR software.

Organizations that have many separate departments that collect data on forms can share the budget for forms processing software by re-using it for other projects. Your current project may not be big enough to justify the expense, but when combined with one or two others it would be.

How much do […]

Document Management

Simple Document Management SystemsThe phrase “document management” is rather broad and can apply to a variety of scenarios depending on the needs (and size) of the business.

Small businesses and departments may only need a system that provides an efficient way to scan paper and save it in an orderly, intuitive structure.

Most projects also require the ability to search and view documents in an integrated viewer or website, and provide ways to annotate images, making notes and markup that other users can see.

Likewise we may be working with more than just digitized paper files. Native born electronic documents such as MS Office docs, PDFs, CAD drawings and graphics files.

There can also advanced records management requirements like access audit trails, document retention, lifecycle and workflow. These features are especially important when dealing with regulatory compliance such as HIPAA and Sarbanes-Oxley.

Our document management solutions can fit any budget or support any project requirements. It’s not always possible to do both at once, but we will try our best!

Contact Us for a free evaluation of your document management project and online demo of our software recommendations.

Personal & Small Business

Users within a single department, working from home or who have a small business can simply scan their documents to a folder that is shared to everyone. In this “ad-hoc” scenario you only need some basic document scanning software to simplify and bring consistency to your filing system. Our SimpleIndex software is a perfect all-in-one scanning and document management tool for this purpose.

If you want to move to the next level, there are Desktop Document Management options that provide an all-in-one means for capture, storage, search and retrieval of documents. These solutions are affordable and focused on automating process of organizing and […]

Document Scanning

One Source, Many Solutions

There are many document scanning solutions to choose from. ScanStore offers many of the top document imaging solutions under one virtual roof. ScanStore‘s CDIA+ consultants can work with you to explain the strengths and weaknesses of each option and even provide a demo of the products using samples that you provide.

You’ll find flexibility with each of these products allowing a one-person shop to jump right in, or scale up to enterprise or service bureau proportions. If you need to throw some data capture into the document imaging mix, ScanStore also carries OCR, forms processing and document management tools.

Information and Advice

Take a look at the Scanning Solutions Comparison page to find in-depth information on the features of the available offerings and for more insight in finding the best fit.

And be sure not to miss the detailed comparison of the favorite Batch Scanning solutions in the exclusive Document Scanning Software Review.

What’s Right for You

You want a paperless office and document scanning is part of the path to get you there. Simply buying a scanner and feeding paper into it isn’t going to save you money. Automation of the scanning process is what holds costs down and drives up your Return on Investment.

For example, if an OCR automation costs $3,000 to implement, but by doing so you save a $15/hr employee 10 hours per week of data entry, the feature has paid for itself in 20 weeks.

So how do we automate the data capture? Here are a few possibilities:

  • Full-Page OCR turns a scan into a full-text document you can search

  • Barcodes on each document contain key data like a customer name or invoice number

  • A single […]

Applications

When you scan a document that has text or numeric data on it, you are able to read and understand what is written in the scanned image. However, to a computer, the resulting image file is just as meaningless an assortment of pixels as a landscape photo. In order to transform this information into an editable format that you can search through, copy, and modify without retyping it manually, you will need the an Optical Character Recognition (OCR) software.

There is a wide variety of OCR software available. While they all share the ability to convert images of machine printed (not handwritten) text or numbers into an editable format, the various software often have different features, accuracy, prices, and language options.

You can find the various types of OCR software with a description of each below.

Users within a single department, working from home or who have a small business can simply scan their documents to a folder that is shared to everyone. In this “ad-hoc” scenario you only need some basic document scanning software to simplify and bring consistency to your filing system.

If you want to move to the next level, there are Desktop Document Management options that provide an all-in-one means for capture, storage, search and retrieval of documents. Additionally, they provide security, advanced capabilities and ease of use above that of the ad-hoc methods

And let’s not forget cloud-based options that alleviate the need to maintain storage servers or keep software up to date.

Need a simple, no frills OCR solution without spending hundreds of dollars on a professional software package? Look no further. There is a no cost, donation optional, OCR freeware solution […]

2024-03-12T15:33:37-04:00Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |

Title

Go to Top