
In the wise words of The Offspring: “You gotta keep ’em separated!”
Automatic document separation is a common headache in document scanning and data capture applications.
When mixed batches of documents are received, it is not always easy to determine where one document ends and the next one begins.
The traditional solution for automatic document separation is to insert barcode or patch code separator pages between documents. Many scanners and scanning applications have built-in functions to read these on-the-fly and create a new file automatically each time one is found. However this approach only works with paper documents, where electronic PDF files have become far more common. And it requires printing hundreds or thousands of sheets and manually inserting them prior to scanning.
Modern Enterprise Data Capture solutions have the ability to define unique page elements that help identify the first and last page of any document. These work well when there is some easily identified text or form element that can be used for automatic document separation, but it doesn’t work when documents are unstructured or have a wide variety of formats, such as invoices.
For documents like invoices, Simple Software has developed intelligent separation scripts that compare extracted data from each page to determine when a new invoice starts. Multi-page invoices and attachments like BOLs and other paperwork are split automatically from large PDF files that vendors often send containing many invoice transactions.
When there is no way to determine the start of each new document automatically, Simple Software has some unique ways to automate document separation.
SimpleIndex can use OMR to separate scanned files based on a black mark placed on a corner of the first page of each file during preparation. This is much faster and better for the environment than inserting printed barcode sheets.
SimpleView lets you control-click to highlight the first page of each document in a thumbnail view, then split the PDF or TIFF file into individual documents automatically based on this selection. This is significantly faster than the drag-and-drop method used by most PDF editors.