Using OCR to capture data from tables and reports
Data that repeats over and over again in a document can be OCR’d to Microsoft Excel, Google Sheets and other spreadsheet formats, or a SQL Database like Access, SQL Server, MySQL and Oracle.
Inexpensive Desktop OCR products like FineReader, ReadIRIS and OmniPage can automatically convert data from tables to Excel and other spreadsheets, as long as the columns are standard and don’t “overlap” such that different field values appear in the same column area, like when one row of each record represents one set of columns and a second row has additional column data.
Converted data will require some clean-up before it is usable in any database or software application, and it is difficult to convert large numbers of documents in batches this way. But it’s a good way to produce structured data from large single reports or small batches of similar report data.
For more complex tables, tables with similar data but different formats on different documents (like Invoices), tables with nested structure like header and detail rows, Enterprise Forms Processing software is required to turn these documents into structured data like XML, JSON or SQL database tables.