Sure, but SimpleOCR usually returns poor results with screen captures.
SimpleOCR handles only bi-level (black & white) and grayscale images. Please don’t scan in color mode.
SimpleOCR handles only bi-level (black & white) and grayscale images. It can’t read color TIFF documents. Convert the file into a bi-level or grayscale format and then load it into SimpleOCR.
SimpleOCR TIFF files use a CCITT Group IV (a.k.a. ITU T-6) compression scheme. Some software applications are not able to decode TIFF files compressed this way.
Most crash bugs have been fixed in version 3.0 of SimpleOCR. Please download the new version to eliminate these errors and greatly improve your SimpleOCR experience!
First of all, please be sure to have the last version of SimpleOCR and restart your computer. Then try the following.
Load the SimpleOCR.tif sample file shipped with SimpleOCR and try to OCR it. If you have a crash, please try to uncheck to “extract images” option and retry. This option doesn’t work on old versions of Windows 95.
Look at SimpleOCR status bar once your document is displayed. You should see something like ‘1728×2200 horizontal and vertical image resolution in Dots Per Inch. If you have some strange values for the resolution (like DPI), it means that your scanning software doesn’t fill the resolution fields properly and it can make SimpleOCR crash. You can report the problem to your scanner manufacturer. Unfortunately, you can’t modify the resolution fields by hand from SimpleOCR. Save the document in a TIFF file and then use a TIFF file editor to change the resolution fields.
If it works with the sample file but not on your document, please try to select the text areas by hand using the “Create Area” tool.
In any case, please mail us a bug report.
SimpleOCR can only recognize the characters used in the English and French language. Therefore, it can not recognize characters like, e.g., ß, ü, ñ, and ú.
The scanning quality is very important. You should obtain a quality comparable with the quality of the sample file shipped with SimpleOCR.
Usually you should use a scanning resolution of 300 DPI. It could be less if the characters are quite big or more if the characters are small.
Next, carefully tune the scanning brightness. Look at the characters in the resulting image. They should be clean. If you have a lot of characters in several pieces (i.e. image is too light) or many characters stuck together (i.e. image is too dark), you should tune the scanning brightness (or try a higher scanning resolution).
If your document has a sophisticated page layout, you can help SimpleOCR by selecting the text areas by hand using the “Create Area” tool.
Avoid having a scanning area bigger than the document. In this case you can have black borders around the document and SimpleOCR doesn’t like that.
This problem occurs when there is more than 20,000 distinct “shapes” (a set of connected pixels) in a document. This probably means your document is “noisy” (a lot of small dots everywhere) or there is an image in your document which looks like a cloud of points.
- select the “noisy” option in the bottom toolbar
- select the text areas with the “area” tool.
Your scanner must be a TWAIN compliant scanner that can acquire a black and white image or grayscale image. A common problem is that you have installed an old 16 bit driver that can’t communicate with 32 bit applications like SimpleOCR. Try to get an updated driver by downloading it from your scanner manufacturer’s website.
If it still doesn’t work, you can always scan from your scanning software, save the resulting image in a black and white TIFF file, and process the file with SimpleOCR.