
This folder is a Hufflepuff.
Automatic Document Classification of documents is how data capture applications quickly determine what type of document is being processed before extracting data from the OCR text.
Document classification algorithms use text matching, page layouts, and artificial intelligence to train models that are able to identify documents by type even when the formatting and quality varies significantly.
A good example of document classification is the LoanStacker application, which takes a complete residential mortgage loan file and identifies the more than 500 forms, disclosures, tax records, and contracts they contain. Once identified these documents can sent to the appropriate workflows for approval, data entry, etc.
While most data capture applications are able to identify document types based on recognition templates, automatic classification algorithms are much faster and significantly improve throughput when there are many different types of documents being processed. Trained AI classification models can also seem to “understand” the common traits of different document types and sort them correctly even when presented with new formats.
Simple Software’s SimpleIndex application provides keyword and pattern matching based document classification at a much lower cost than enterprise solutions.
Our collection of OCR Data Capture applications all have built-in automatic document classification capabilities, including machine learning and script based manual overrides.