Using Artificial Intelligence to train OCR templates
Modern Forms Processing applications have AI-based training algorithms that let users point and click on the location of data in their documents and create OCR templates automatically.
This bypasses the technical requirements of creating complex OCR templates, especially for varied documents like Invoices where the data doesn't always appear in the same place.
But how good are these AI-based training systems?
In our experience they work well when you have:
- Good quality scanned images
- Clearly labeled data
- Tables with regular columns
Point and click style training doesn't work quite as well with:
- Poor quality images
- Data that appears within paragraphs
- Tables with overlapping columns, subtotal rows, etc.
These types of documents can still be captured with OCR but they will usually require an experienced technician to manually configure the template.
For natural language data like legal documents, a new artificial intelligence technology called nlp (Natural Language Processing) is available. These work by attempting to “understand” the language used in documents to interpret the location of data points based on meaning. ABBYY FlexiCapture also supports nlp-based training for these types of documents.