Document-Classification-Sorting-Hat

This folder is a Hufflepuff.

Automatic Document Classification of documents is how data capture applications quickly determine what type of document is being processed before extracting data from the OCR text.

Document classification algorithms use text matching, page layouts, and artificial intelligence to train models that are able to identify documents by type even when the formatting and quality varies significantly.

A good example of document classification is the LoanStacker application, which takes a complete residential mortgage loan file and identifies the more than 500 forms, disclosures, tax records, and contracts they contain. Once identified these documents can sent to the appropriate workflows for approval, data entry, etc.

While most data capture applications are able to identify document types based on recognition templates, automatic classification algorithms are much faster and significantly improve throughput when there are many different types of documents being processed. Trained AI classification models can also seem to “understand” the common traits of different document types and sort them correctly even when presented with new formats.

Simple Software’s SimpleIndex application provides keyword and pattern matching based document classification at a much lower cost than enterprise solutions.

Our collection of OCR Data Capture applications all have built-in automatic document classification capabilities, including machine learning and script based manual overrides.

Cloud OCR vs Sunshine OCR


VS.

Cloud OCRWith the development of Cloud Computing, more and more OCR solutions started to move processing to the cloud. There are several major Cloud OCR solutions like AWS Textract (from Amazon), Azure AI Vision (from Microsoft), Google Cloud Vision AI, as well as more specialized solutions like ABBYY Vantage.

These days it is much harder to find an OCR data capture solution that is not fully or partly cloud-based.

Sunshine OCR, On Premise OCR,Sunshine Software or Sunshine OCR refers to on-premise Optical Character Recognition software that requires no internet connection to operate. Since there is no Cloud involved, we are calling it Sunshine Software to shine a light on the advantages of avoiding the Cloud.

We call it “Sunshine” because it’s catchy and makes us smile. But also because there is no good marketing term for this type of software. Before the “Cloud” became popular, it was known as simply as “Software.” After the development of Cloud Computing, it has been referred to as On-Prem, On-Premise, On-Site, Offline, Local, Native, Self-Hosted, In-House and other terms, but none of them highlight the benefits it has over the Cloud like Sunshine Software.

Cloud OCR or Sunshine OCR?

Cloud OCRIn the ever-evolving landscape of technology, businesses are faced with critical decisions regarding the deployment of software. Two prominent models, cloud-based and sunshine-based (on-premise) software, offer distinct approaches to meeting organizational needs. Understanding the differences between these models is essential for making informed decisions aligned with business goals and requirements.

Key Characteristics

Cloud-Based OCR Software:

Cloud OCR software operates on remote servers accessible over the internet. This model offers unparalleled scalability, allowing businesses to […]

Aviation and Marine Fuel Invoices

Aviation and Marine fuel invoice processing has a number of data capture and normalization requirements that make them very hard to automate without specialized knowledge and customized business rules.

Our team has implemented aviation, marine, and land fuel transportation invoice automation projects, with some of the top global energy services companies. We have extensive knowledge of these types of invoices and the way that they need to be processed to ensure that the myriad of taxes and fees are captured and applied to the correct fueling.

What Makes Aviation and Marine Invoice Processing So Complex?

  • Multiple taxes from multiple jurisdictions apply to each transaction. Excise tax, Environmental tax, LUST tax, VAT, etc.
  • Additional fees like Airport, Storage, Intoplane, Flowage, Fuel Surcharge, etc.
  • Different vendors use different descriptions for each tax and fee.
  • Invoices with multiple itemized fuel transactions but a single summary of taxes and fees that must be applied to each fueling proportionally.
  • Sometimes the same data appears in the header, and other times in the line items, sub-tables, or footers.
  • Aviation and Marine are international businesses, so handling multiple languages and currencies, exchange rate calculation, and currency conversions must be performed.
  • Since the price of fuel changes constantly, slight changes must be accommodated within acceptable tolerance.

Achieving Total Automation

Other invoice solutions capture data only as it appears on the invoice. While some normalization or mapping to known item IDs can be done, it is rare that the data can be imported and matched to a fuel order without additional manual steps. Unless the data being imported matches the expected data line by line, someone must manually reconcile the differences.

When orders are placed for a single uplift, but the vendor sends a daily or weekly […]

Digitech Systems OCR

Digitech Systems creates an award-winning digitization and content management software and cloud services that deliver Any Document, Anywhere, Anytime®, organizations of all sizes now securely and effectively extract, manage and automate their business information.

The PaperVision and ImageSilo brands are used by thousands of businesses worldwide from global conglomerates to Main Street to help teams pave the path to more meaningful work by transforming how they handle digital files, processes, documents, and more.

Starting with PaperFlow digitization software, subsequent years brought on-premise information management (PaperVision Enterprise, 1998), the world’s first cloud Enterprise Content Management system (ImageSilo, 1999), business process automation (PaperVision Enterprise WorkFlow, 2002), intelligent data capture (PaperVision Capture, 2009), and AI-enabled forms processing (PaperVision Forms Magic™, 2016). The PaperVision®.com cloud-based information management system was introduced in 2020. These products provide unprecedented ease-of-use and architectural flexibility, while balancing fully featured products with a sensible price/performance ratio and legendary customer service.

PaperVision Enterprise performs dependable electronic document management to automate office environments, conserve paper, time, money, and provide peace of mind. Retrieval solutions provide enterprise scalability and functionality with advanced features that enhance your efficiency and protect corporate data. Electronic information is retrieved instantly with our user-friendly graphical interface that displays a complete overview of all your available projects. View, manipulate, print, fax, export, and e-mail documents directly from your PC. On-premise installation or cloud hosted services available.

PaperVision Direct Scan, import, index, and organize paper documents using your existing scanners and multi-function devices (MFD) to create convenient digital files and securely upload them to the cloud. Start scanning documents right at your desk! Turn any vulnerable paper document into a useful digital file that can be securely managed in your PaperVision.com cloud service.

PaperVision Capture Forms Magic adds handwriting […]

Using FlexiLayout Studio to Design Data Capture Templates

FlexiLayout: How to capture a table using Repeating Group if table header is on each page

In some cases, we might have a table that we are not able to capture correctly using a traditional method – Table element. In such cases, we usually use Repeating Group element.

But what if we come across a multi-page document that has a table header on each page?

mceclip0.png

We can use two following methods to capture such a table using the Repeating Groups.

Using Absolute search area constraints

To limit the search area to the table area so that it doesn’t capture unnecessary text outside of the table, we can use Absolute search area constraints in the Search Constraints tab.

You can measure the area with the Measure Rectangle tool.

mceclip0.png

Using nested Repeating groups

Sometimes it might be not suitable to use the Absolute search area constraints method because other tables using this layout might have different positions and lengths of elements, thus making it not convenient to use the method, because you will have to re-measure the area every single time.

In such a case, you can use the nested Repeating group method.

  1. Create the first, “main” Repeating group that will include the Table header and footer. mceclip1.png
  2. Next, create the nested RG in the first RG. The relations are as follows: mceclip2.png
  3. These are the main steps, other elements in the RG don’t need any specific settings and should be designed according to the needed results.

Additional information

FlexiLayout: Capturing a table using Repeating Group

 

How to reliably capture elements in FlexiLayout Studio if the image resolution can vary

When the image resolution varies, then the search area of elements based on absolute offsets can miss […]

AI and Machine Learning in ABBYY FlexiCapture and Vantage

How to train NLP machine learning model

Today different industries face similar challenges as they seek to extract information from business documents, such as policies, e-mails and legal agreements – and most agree that is costly, time consuming and prone to errors with manual data entry.

In this video you will learn how to train NLP machine learning model in FlexiCapture to extract entities and text passages from Lease agreements.

Converting unstructured documents into structured data automatically makes this information available to your business applications while saving you time, money, and labor in the process.

Using ABBYY Vantage Document Skills

Processing Your First Documents with Vantage

Learn how easy it is to get started with Vantage – upload your documents and Vantage will take care of the rest.

 

How to Create and Train a Vantage Document Skill

Learn how to use the Vantage Skill Designer to create and train a new Document Skill with just a few sample documents.

 

How to Create and Train a Classification Skill in ABBYY Vantage

Learn how to use the Vantage Skill Designer to train a new Classification Skill. You need just a few samples of each document class.

 

 

How to Automate a Complete Workflow, by Creating a Vantage Process Skill

 

 

How to Edit a Document Skill

Learn how to adapt already existing skills to your specific documents and business requirements.

 

 

How to perform the first authentication in Vantage Swagger UI?

To get a first access token perform the initial authentification using the default client, one does not need to enter any passwords or client ID. The initial authentication is preconfigured. Just open a Swagger page (EU link or US link), click Authorize:

mceclip1.png

Select all scopes, and click Authorize again:

mceclip0.png

The password should be specified only for a custom client. A custom client can be created after the initial initialization.

References

EU Help: Getting a Tenant Identifier or US Help: Getting a Tenant Identifier

EU Help: Creating a Client or US Help: Creating a Client

Learn more at ABBYY […]

ABBYY Vantage

ABBYY Vantage leverages AI machine learning and a huge library of document “skills” to provide out-of-the-box data capture for all kinds of documents.

Vantage provides a simple way to implement new data capture processes without the need for programmers.

It takes the FlexiCapture platform, hosts it in the cloud, and dramatically simplifies the interface. The thousands of settings you can use with FlexiCapture to build templates are managed by the AI, giving you a simple point and click interface to create new document capture workflows.

The “Skills” library gives you pre-configured capture workflows for hundreds of the most common documents. Simply connect them to your import and export destinations and you are ready to go, saving you hours or even days of development time.

PaperVision Capture Forms Magic

PaperVision Capture Forms Magic adds handwriting recognition, forms processing, invoice processing or healthcare claims forms templates and business rules to their high-volume document scanning and data capture platform.

OCR Data Capture

What is OCR Data Capture?

document OCR process automationOCR stands for Optical Character Recognition and is the technology that allows software to interpret text on scanned images. When this technology is applied to automating business data entry processes it’s referred to as OCR Data Capture.

Many are familiar with popular desktop OCR applications designed to convert scanned images to editable documents. When this process is applied to specific areas of the document containing data fields it’s called zone OCR. But OCR data capture software is more than just simple zone OCR. Modern applications use some or all of these technologies:

Enterprise data capture systems provide interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Who can benefit from OCR data capture software?

messy business information made easy with ocr data captureAny organization that collects data from paper documents, or electronic files like PDF and Office documents, can get a very high return on investment by automating the data entry with OCR data capture software.

You do need to have a significant number of documents to […]

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud

ABBYY FlexiCapture Cloud delivers ABBYY’s advanced data capture platform capabilities via REST API and web interfaces. ABBYY FlexiCapture Cloud customers can rapidly configure and deliver their Content IQ solution, taking advantage of our cloud services to automate and accelerate their document-driven processes. The advanced machine learning and AI in the platform improve classification and data extraction results, enabling core processes to support better, smarter, faster decisions.

FlexiCapture Cloud enables organizations to accelerate digital transformation by complementing their automation systems with new and advanced cognitive capabilities that liberate the intelligence locked in their documents.

ABBYY FlexiCapture On-Premise

ABBYY FlexiCapture On-Premise – Distributed – Perpetual License PPY 50K Pages

ABBYY FlexiCapture is a powerful data capture and document processing solution from a world-leading technology vendor. It is designed to transform streams of documents of any structure and complexity into business-ready data. And its award-winning recognition technologies, automatic document classification, plus a highly scalable and customizable architecture, mean that it can help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs.

Enterprise OCR Applications

Enterprise OCR Data Capture Software Enterprise OCR Data Capture Software

Enterprise OCR refers to applications designed with the features and scalability required for large businesses and service operations.

Speed and efficiency are the name of the game at the enterprise level so options like batch processing, multi-user and multi-server workflows, security and compliance auditing are found in these applications.

Enterprise OCR can also refer to Enterprise Site Licensing for desktop OCR applications that allow any user in your organization to install licensed OCR tools without incremental costs. Contact Us for a quote on any Site License.

Enterprise Data Capture Solutions Enterprise Constitution Class Starship

Enterprise Document Management

With the high volume of documents coming out of an enterprise OCR product, there is a need for robust Document Management applications with enhanced features that cover the stricter oversight needs of large organizations. Sorting through thousands or millions of pages can quickly turn digital documents into a quagmire without proper organization, tagging, search and workflow capabilities.

Enterprise Document Management features include:

  • Digital signatures
  • Document life cycle management
  • Version control
  • Advanced keyword searching & full-text indexing
  • Audit trails (HIPAA, Sarbanes compliance)
  • Cloud Based Document Management Apps Cloud Based Document Management Apps

    Email archiving

  • Workflow routing
  • Enterprise Report Processing (ERP)
  • Document access control

Our document management solutions work with any of the enterprise OCR products below to provide a secure end-to-end solution. Contact Us to see how they work together in an online demo or get a quote.

Simple Software

SimpleIndex can bring speed and efficiency to your scanning or doc filing no matter the process. Even if all you are doing is hand keying a few basic details about a document, breaking those details into individual indexes and adding tools like drop down choice lists, automatic orientation, and blank page deletion ensure a smoother, more consistent process.

Automation

Here’s where things start to get interesting. From basic tasks like splitting individual documents within at stack of pages by spotting a blank page, a specific mark, or a barcode separator to capturing index data directly from the page or looking up additional details about a document in a database, SimpleIndex has a host of powerful tools to tame your piles of paper or drives full of digital files. Let’s look at a few.

OCR

Optical Character Recognition is the ability to take a scan, which is merely a picture of a page, and turn it into words that the computer can understand and use to index your files. SimpleIndex leverages the power of ABBYY FineReader, recognized as one of the best OCR engines on the market, to accurately capture names, dates, important numbers, document types, and other details about your file. Some products have you set a box and capture whatever information happens to fall in that zone. SimpleIndex takes it further with Dynamic Zone OCR to enable you to set an oversized zone that allows for shifting of the pages between scans, but still captures just the date you need by matching against templates, lists, or even Regular Expressions (RegEx). You can also skip the zones entirely and use the full text of a page to find matches for your index data.

Barcodes

Forms Processing

What is ICR, Survey & Forms Processing?

ICR stands for Intelligent Character Recognition and is the technology that allows software to interpret hand printed text on scanned images.

Data Capture Forms OCRForms Processing Software uses ICR technology to automate data entry tasks involving hand-filled surveys, applications and forms. It provides interfaces for scanning, recognition, data verification and export, as well as management and monitoring tools to track large volumes of documents and data through the workflow.

Forms Processing also includes OCR (Optical Character Recognition) technology to recognize machine printed text, and OMR (Optical Mark Recognition) for check boxes and multiple choice bubbles.

It is also possible to use these applications to automate data collection from PDF forms, Word documents, Excel spreadsheets, and other formats used to fill out forms electronically. Many include the ability to publish forms as paper, fillable PDF and web pages simultaneously to distribute and collect data from multiple sources into one dataset.

Who can benefit from forms processing software?

Any organization that collects data on paper-based forms, surveys or applications on a regular basis can get a very high return on investment by automating the data entry with forms processing software.

You do need to have a significant number of forms to justify the expense– at least a hundred forms per month or more depending on how much data is being captured. If the data entry task can be done in under 100 man-hours then it is not a good candidate for automation with ICR software.

Organizations that have many separate departments that collect data on forms can share the budget for forms processing software by re-using it for other projects. Your current project may not be big enough to justify the expense, but when combined with one or two others it would be.

How much do […]

Document Management

Simple Document Management SystemsThe phrase “document management” is rather broad and can apply to a variety of scenarios depending on the needs (and size) of the business.

Small businesses and departments may only need a system that provides an efficient way to scan paper and save it in an orderly, intuitive structure.

Most projects also require the ability to search and view documents in an integrated viewer or website, and provide ways to annotate images, making notes and markup that other users can see.

Likewise we may be working with more than just digitized paper files. Native born electronic documents such as MS Office docs, PDFs, CAD drawings and graphics files.

There can also advanced records management requirements like access audit trails, document retention, lifecycle and workflow. These features are especially important when dealing with regulatory compliance such as HIPAA and Sarbanes-Oxley.

Our document management solutions can fit any budget or support any project requirements. It’s not always possible to do both at once, but we will try our best!

Contact Us for a free evaluation of your document management project and online demo of our software recommendations.

Personal & Small Business

Users within a single department, working from home or who have a small business can simply scan their documents to a folder that is shared to everyone. In this “ad-hoc” scenario you only need some basic document scanning software to simplify and bring consistency to your filing system. Our SimpleIndex software is a perfect all-in-one scanning and document management tool for this purpose.

If you want to move to the next level, there are Desktop Document Management options that provide an all-in-one means for capture, storage, search and retrieval of documents. These solutions are affordable and focused on automating process of organizing and […]

ABBYY OCR

ABBYY is one of the leading OCR (Optical Character Recognition) companies in a world. They offer a large variety of document capture and automation products starting with FineReader Pro for individual or small business scale companies and FineReader Corporate. If you need to process many thousands or millions of pages, ABBYY has FineReader Server for full-text OCR and FlexiCapture for OCR data capture. Many companies are using their products for its flexibility and scalability, there is always a way to customize ABBYY OCR products to fit your automation needs.

ABBYY FineReader OCR software helps individuals turn scans of paper documents, PDF files, and digital photographs into searchable and editable formats. Unmatched text recognition accuracy and document conversion capabilities virtually eliminate retyping and reformatting. Intuitive use and one-click automated conversion tasks let you do more with this OCR software in fewer steps. Up to 190 languages supported for text recognition and document conversion – absolute record on OCR/PDF software market!

ABBYY FlexiCapture is a powerful data capture and document processing solution. It is designed to transform streams of documents of any structure and complexity into business-ready data.  Solid recognition technologies, automatic document classification and a highly scalable and customizable architecture, will allow it to help companies and organizations of any size to streamline their business processes, increase efficiency and reduce costs.

ABBYY FineReader Server is powerful server-based OCR software for automated document capture and PDF conversion. Designed for mid- to high-volume batch processing, it enables organizations and scanning service providers to establish cost-efficient processes for converting paper, as well as TIFF, JPEG, and PDF image documents into electronic files suitable for full-text search and long-term digital archiving.

ScanStore and SimpleSoftware are highly experienced integrators of ABBYY […]

Title

Go to Top