Thank you for choosing SimpleOCR–the royalty-free OCR engine! These instructions will tell you the basics of how to integrate SimpleOCR into your application.
SimpleOCR contains several group of functions including image manipulation, image I/O with TIFF files, image acquisition with TWAIN compliant scanners, and of course, OCR. Note that SimpleOCR can read and create TIFF files containing bi-level (i.e. black & white) images. TIFF files are created by SimpleOCR using the CCITT Group IV compression scheme, but it can read most TIFF bi-level images.
The source code examples are given in VB and C++. The function headers are given in C++, since this is the original language that SimpleOCR was written in. To translate, simply replace all pointer variables with long integers and all char * with strings. Also, the ActiveX functions all have an “X” appended to the name (OCR->OCRX, LoadImg->LoadImgX, etc.). In the documentation, SimpleOCR refers to general library functions, while SimpleOCX is used to refer specifically to the ActiveX control.
SimpleOCX is an ActiveX dynamic link library (Dll) that allows developers to quickly integrate the SimpleOCR functions from any ActiveX-compatible programming environment. SimpleOCX acts as a “wrapper” for the core SimpleOCR libraries. Hence, SimpleOCX is not a native ActiveX control; it only provides an ActiveX interface to the SimpleOCR functions contained in ocrdll.dll and dlltwain.dll. Programmers who desire more efficient execution may forgo use of SimpleOCX.dll and interface directly with the core libraries.
Adding SimpleOCR to your application
The following instructions are provided in Visual Basic, but the implementation of SimpleOCR is similar in any development environment that uses ActiveX. Consult your documentation for language-specific instructions on how to integrate ActiveX dlls.
-
Ensure that SimpleOCX.dll has been properly registered using “regsvr32.exe c:\Program Files\SimpleOCR\simpleocx.dll”
-
Add a reference to “SimpleOCX” using the Project/References menu
-
You can now declare variables of type “SimpleOCR” and access all of the SimpleOCR functions through this object
Constants Used By SimpleOCR (VB)
Copy these constant declarations into a VB module.
Using the SimpleOCR ActiveX Control (VB)
This is an example of how to perform OCR on a multi-page image file using SimpleOCR X and VB.
Related Functions: LoadMultipleImg, SetLanguage, SetOutputMode, OCRSetOutputHandler, GetNbImages, GetImage, OCR, FreeMultipleImg
OCR on a Multipage TIFF Image File (C++)
This function process several images stored in a TIFF file. The OCR results are stored in a text file.
This function uses a OCR output handler that is defined now:
Related Functions: LoadMultipleImg, SetLanguage, SetOutputMode, OCRSetOutputHandler, GetNbImages, GetImage, OCR, FreeMultipleImg
Displaying an Image (C++)
This function displays an image at coordinates x, y in a display context:
Related Functions: GetImgSize, GetImgBitmap, GetImgBitmapInfo
Scanning Documents (C++)
This function scans “n” images and creates a set with the scanned images.
Related Functions: ScanInit, ScanEnd, CreateMultipleImg, ScanAndAddImage
IMG
All the images manipulated by SimpleOCR have the IMG type. SimpleOCR provides functions that allows you to handle an IMG object as a Device Independent Bitmap (DIB). See your Windows SDK documentation in order to become familiar with DIB concepts.
The IMG objects are always manipulated through SimpleOCR functions. So even if IMG objects are implemented as structures, you don’t have to bother with its definition. When programming in languages besides C++, substitute IMG * with Long Integer data types.
Please remember, SimpleOCR can only work with bi-level (i.e. black & white) images or grayscale images with 256 shades of grays.
SETOFIMG
Several images that belong to the same document can be grouped in a SETOFIMG object. The SETOFIMG objects are always manipulated through SimpleOCR functions. So even if IMG objects are implemented as structures, you don’t have to bother with its definition.
When programming in languages besides C++, substitute SETOFIMG * with Long Integer data types.
AddDIB
int AddDIB(SETOFIMG *Â set, HGLOBALÂ hDib)
This function is similar to AddImage, except that, instead of an IMG object, this function expects an image in the DIB format.
Return Value
If the function fails, a nonzero value is returned.
Parameters
set
A pointer to a set of images of type SETOFIMG.
hDib
A HGLOBAL handler referencing a Global Memory object containing a BITMAPINFO structure followed by the bitmap bits.
Example
Related Functions: AddImage, InsertImage
AddImage
int AddImage(SETOFIMG *Â set, IMG *Â image)
Add an image to a set of image. The new image will be added at the last position.
Return Value
If the function fails, a nonzero value is returned.
Parameters
Example
Related Functions: AddDIB, InsertImage
CreateMultipleImg
SETOFIMG * CreateMultipleImg(void)
Creates an empty set of images.
Return Value
An empty set of images or NULL if the function fails.
Example
Related Functions: FreeMultipleImg
DelImage
void DelImage(SETOFIMG * set, int index)
Deletes and frees an image in a set.
Return Value
None
Parameters
set
A pointer to a set of images of type SETOFIMG.
index
Position of the deleted image, counted from 0.
Example
CountPixelsImg
int CountPixelsImg(IMG *Â img)
Count the number of black pixels in an image.
Return Value
Number of black pixels in the image
Parameters
img
Pointer to an image of type IMG.
Example
DeskewImg
int DeskewImg(IMG *Â img)
When a document has not been properly scanned, the resulting image can be skewed. This function analyses a skewed image and rotates it in order to fix the problem.
Return Value
If the function fails a nonzero error code is returned.
Parameters
img
Pointer to an image of type IMG.
Example
Related Functions: RotateImg
DIBToIMG
IMG * DIBToIMG(HGLOBALÂ hDib)
Converts a memory block containing a Device Independent Bitmap (DIB) to an IMG object.
Return Value
A pointer to an IMG object. If the function fails, the return value is NULL.
Parameters
hDib
A HGLOBAL handler referencing a Global Memory object containing a BITMAPINFO structure followed by the bitmap bits.
Comments
See your Windows SDK documentation for obtaining information about Device Independent Bitmaps and how to use it.
Example
Retrieve a DIB from the clipboard and convert it to an IMG object
EraseBlackBordersImg
int EraseBlackBordersImg(IMG *Â img)
Sometimes a scanned image has black borders. It happens frequently when the scanned document is smaller than the scanning area. This function detects and removes these black borders.
Return Value
On success, the function returns 0. If the function fails, the return value is different of 0.
Parameters
img
Pointer to an image of type IMG.
Example
ExtractImgArea
IMG * ExtractImgArea(IMG * img, int x, int y, int w, int h)
Extracts a rectangular area from an image.
Return Value
A pointer to a new IMG object. If the function fails, the return value is NULL.
Parameters
img
Pointer to an image of type IMG.
x
x coordinate of the upper left corner of the area.
y
y coordinate of the upper left corner of the area.
w
area width in pixels.
h
area height in pixels.
Comments
The original image is left unchanged. The extract image should be freed with the function FreeImg
Example
FixOrientationImg
int FixOrientationImg(IMG *Â img)
When a document has not been properly scanned, the resulting image can be of the wrong orientation. This function analyses an image in the wrong orientation, and rotates it the necessary 90, 180, or 270 degrees.
Return Value
If the function fails a nonzero error code is returned
Parameters
img
Pointer to an image of type IMG.
Example
FreeImg
void FreeImg(IMG *Â img)
This function frees an existing image.
Return Value
None
Parameters
img
Pointer to an image of type IMG.
Example
The following example frees an image
FreeMultipleImg
void FreeMultipleImg(SETOFIMG *Â set)
Frees a set of images. All the images in the set are also freed.
Return Value
An empty set of images or NULL if the function fails.
Parameters
set
A set of images of type SETOFIMG
Example
Related Functions :Â CreateMultipleImg
GetImage
MG * GetImage(const SETOFIMG * set, int index)
Gets a pointer to a given image in a set of images.
Return Value
A pointer to the image of order index in the set.
Parameters
set
A pointer to a set of images of type SETOFIMG
index
Order of the image you want to access. The first image is at index 0.
Example
GetImgBitmap
unsigned char * GetImgBitmap(const IMG *Â img)
This function gets a pointer to the bitmap corresponding to the image. The bitmap is organized like a Device Independent Bitmap (DIB)
Return Value
A pointer to the bitmap that encodes the image.
Parameters
img
Pointer to an image of type IMG.
Comments
See your Windows SDK documentation for obtaining information about Device Independent Bitmaps and how to use it.
Example
Related Functions: GetImgBitmapInfo, GetImgBitmapSize
GetImgBitmapInfo
LPBITMAPINFO GetImgBitmapInfo(const IMG *Â img)
This function gets a pointer to the BITMAPINFO structure corresponding to the image.
Return Value
A pointer to a BITMAPINFO structure.
Parameters
img
Pointer to an image of type IMG.
Comments
See your Windows SDK documentation for obtaining information about the BITMAPINFO structure and how to use it.
Example
Related Functions: GetImgBitmap, GetImgBitmapSize
GetImgBitmapSize
int GetImgBitmapSize(const IMG *Â img)
This function returns the size in bytes of the bitmap corresponding to the image.
Return Value
Bitmap size in bytes.
Parameters
img
Pointer to an image of type IMG.
Comments
Example
Related Functions: GetImgBitmapInfo, GetImgBitmap
GetImgRes
void GetImgRes(const IMG *Â img, int *Â pw, int *Â ph)
This function allows you to get the horizontal and vertical resolution of an image
Return Value
None
Parameters
img
Pointer to an image of type IMG.
pw
Pointer to an integer that will contain the image horizontal resolution, given in Dots Per Inch (DPI).
ph
Pointer to an integer that will contain the image vertical resolution, given in DPI.
Example
Related Functions: GetImgSize
GetImgSize
void GetImgSize(const IMG *Â img, int *Â pw, int *Â ph)
This function allows you to get the size of an image, given in pixels.
Return Value
None
Parameters
img
Pointer to an image of type IMG.
pw
Pointer to an integer that will contain the image width in pixels
ph
Pointer to an integer that will contain the image height in pixels
Example
The following example retrieves an image size
Related Functions: GetImgRes
GetNbImages
int GetNbImages(const SETOFIMG *Â set)
Returns the number of images that a set of images contains.
Return Value
The number of images in the set
Parameters
set
A pointer to a set of images of type SETOFIMG.
Example
HalfSizeImg
IMG * HalfSizeImg(IMG *Â img)
Shrinks a bi-level image at 50% of the original size and returns the result in a grayscale image.
Return Value
A pointer to a new IMG object. If the function fails, the return value is NULL.
Parameters
img
Pointer to an image of type IMG.
Comments
The original image is left unchanged. If you don’t need it anymore, you have to free it by calling the FREEIMG function. This function is mainly useful when you want to display a reduced bi-level image with a good display quality.
Example
InsertImage
int InsertImage(SETOFIMG * set, int index, IMG * image)
Inserts an image in a set of image at a given position.
Return Value
If the function fails, a nonzero value is returned.
Parameters
set
A pointer to a set of images of type SETOFIMG.
index
Position of the inserted image, counted from 0.
image
Inserted image.
Example
InvertImg
int InvertImg(IMG *Â img)
Inverts an image (black pixels becomes white and white pixels becomes black)
Return Value
If the function fails a nonzero error code is returned.
Parameters
img
Pointer to an image of type IMG.
Example
LoadImg
IMG * LoadImg(const char *Â filename)
Loads an image from a TIFF file.
Return Value
A pointer to the loaded image or NULL if the function fails.
Parameters
filename
TIFF file name
Comments
When you don’t need the loaded image anymore, you have to free it by calling the FreeImg function. If you load a multiple image TIFF file, only the first image stored in the file is loaded. You have to use LoadMultipleImg for handling multiple image files.
Example
LoadMultipleImg
SETOFIMG * LoadMultipleImg(const char *Â filename)
Loads a set of images from a TIFF file.
Return Value
A pointer to the loaded set or NULL if the function fails.
Parameters
filename
TIFF file name.
Comments
When you don’t need the set anymore, you have to free it by calling the FreeMultipleImg function. If your TIFF files contain only one image, you should use LoadImg.
Example
Related Functions: SaveMultipleImg, LoadImg
OCR
int OCR(const IMG * img, int noisy)
Recognizes the text located in an image.
Return Value
A non zero error code if the function fails.
Parameters
img
A pointer to the image to process.
noisy
Non zero value if the image is noisy (i.e. contains a lot of speckles)
Related Functions:Â OCROnArea
OCROnArea
int OCROnArea(const IMG * img, int noisy)
Recognizes the text located in an image that contains a unique text area. This function doesn’t do any layout analysis on the area. The image containing the area is usually extracted from a page with ExtractImgArea.
Return Value
A non zero error code if the function fails.
Parameters
img
A pointer to the image to process.
noisy
A non zero value if the image is noisy (i.e. contains a lot of speckles)
Related Functions: ExtractImgArea, OCROnArea2
OCROnArea2
int OCROnArea2(const IMG * img, int noisy, int startprogress, int endprogress)
This function is similar to OCROnArea but allows you to give starting and ending values for the progress percentage. It is useful when you want to have to display a progress bar when processing several areas.
Return Value
A non zero error code if the function fails.
Parameters
img
A pointer to the image to process.
noisy
A non zero value if the image is noisy (i.e. contains a lot of speckles)
startprogress
Starting value for the progress percentage.
endprogress
Ending value for the progress percentage.
Related Functions: OCROnArea, ExtractImgArea, OCRSetProgressHandler
OCRSetOutputHandler
OCROutputHandler OCRSetOutputHandler(OCROutputHandler handler)
When the output mode is OM_TEXT or OM_RICHTEXT, a user defined function of type OCROutputHandler will be called by the OCR engine for each “OCR event”.
Return Value
Previously selected output handler.
Parameters
handler
New OCR Output handler function.
Comments
If the output mode is OT_TEXT, OCR events among OT_PROP, OT_ITAL, OT_UNDS, OT_SIZE, OT_HILT and OT_BITM are not sent to the output hander.
An OCROutputHandler has the following form:
void AnOCRHandler(int event, int param);
with event, the code of the “OCR event” and param a value associated with the event.
The OCR events are:
OT_TEXT
A character has been recognized. param contains the ASCII code of the recognized character.
OT_PROP
The font type has changed (proportional or non proportional font). param is nonzero if the font is proportional.
OT_ITAL
Switches italic mode on or off. param is nonzero if the following characters are italic.
OT_UNDS
Switches underscored mode on or off. param is nonzero if the following characters are underscored.
OT_SIZE
Changes the character size. param contains the font size for the following characters.
OT_HILT
Changes the character color.
If param contains 1, the following word is not in the dictionary. If param contains 2, the following word has not been well recognized.
OT_ENDL
An end of line has been reached.
OT_ENDZ
An end of text area has been reached.
OT_BITM
An image has been recognized. (IMG *) param is a pointer to the image.
Related Functions: SetOutputMode, OCRSetOutputCharHandler
OCRSetOutputCharHandler
OCROutputCharHandler OCRSetOutputCharHandler(OCROutputCharHandler handler)
When the output mode is O
M_TEXT or OM_RICHTEXT, a user defined function of type OCROutputCharHandler will be called by the OCR engine for each “real character” (i.e.: not for EOLs and Spaces). This function is called immediately after OCROutputHandler is called with the event OT_TEXT.
Return Value
Previously selected output handler.
An OCROutputCharHandler has the following form:
void AnOCRCharHandler(int ch, int conf, int left, int top, int width, int height);
ch
ASCII code of recognized character.
conf
Confidence level of the recognized character. Values can be 0-100. The higher the confidence – the engine is more sure about the recognized character.
left, top
coordinates (in pixels) of character in original image.
width, height
width and height in pixels of recognized character.
Related Functions: OCROutputHandler
OCRSetProgressHandler
OCRProgressHandler OCRSetProgressHandler(OCRProgressHandler handler)
When the OCR engine processes a document, a user defined function of type OCRProgressHandler, is called several times.
Return Value
Previously selected progress handler.
Parameters
handler
New OCR Progress handler function.
Comments
An OCRProgressHandler has the following form:
int AProgressHandler(int percent);
with percent, the percentage of the job completed at the time of the call. This value is between 0 and 100.
Defining such a function allows an application to display a progress bar. With this function, it’s also possible to interrupt the OCR process. If the progress handler returns a non zero value, the OCR process is stopped.
OCRSetTemplate
void OCRSetTemplate(const char *Â theTemplate)
Sets the template for use in template matching during the OCR process.
Return Value
None
Parameters
theTemplate
String containing the template to use in OCR template matching.
Comments
The templates consist of the following:
A – Letter
X – Any character
? – Optional character.
Other – Must match character
Providing a string of zero length, a NULL value, or the number zero will turn off template matching Template recognition can be increased by limiting the character set to only those characters that will appear in the strings matched by the template.
OCRLimitCharsTo
void OCRLimitCharsTo(const char *Â charsToLimit)
Sets the characters that the OCR output will be limited to.
Return Value
None
Parameters
charsToLimit
string containing the characters that the OCR output will be limited to
Comments
There are default limited character sets that are defined as follows:
#define LC_ALPHABETIC 2 only letters
#define LC_ALPHANUMERIC 3 no punctuation
#define LC_UCASE 4 all uppercase
#define LC_LCASE 5 all lowercase
#define LC_NONNUMERIC 6 no numbers
Passing the function a string of zero length, a NULL value, or a zero will turn off the limiting of characters.
ReplaceImage
int ReplaceImage(SETOFIMG * set, int index, IMG * image)
Replaces an image in a set of image at a given position.
Return Value
If the function fails, a nonzero value is returned.
Parameters
set
A set of images of type SETOFIMG.
index
Position of the replaced image, counted from 0.
image
A pointer to the new image.
Comments
The replaced image is automatically freed.
Example
ResizeImg
IMG * ResizeImg(IMG * img, int nw, int nh)
Resizes an image.
Return Value
A pointer to a new IMG object. If the function fails, the return value is NULL.
Parameters
Comments
The original image is left unchanged. If you don’t need it anymore, you have to free it by calling the FreeImg function.
Example
Related Functions: ShrinkImg, HalfSizeImg
RotateImg
int RotateImg(IMG * img, int angle)
Rotates an image.
Return Value
If the function fails a nonzero error code is returned.
Parameters
Comments
Example
SaveImg
int SaveImg(const char *Â filename, const IMG *Â img)
Saves an image to a TIFF file.
Return Value
If the function fails a nonzero error code is returned.
Parameters
filename
Name of the TIFF file you want to create.
img
A pointer to the image to be saved.
Comments
Example
Related Functions:Â LoadImg
SaveMultipleImg
void SaveMultipleImg(const char *Â filename, SETOFIMG *Â set)
Saves an image from a TIFF file.
Return Value
If the function fails a nonzero error code is returned.
Parameters
filename
Name of the TIFF file you want to create.
set
Image to be saved.
Comments
Example
Related Functions:Â LoadMultipleImg
ScanAndAddImage
int ScanAndAddImage(SETOFIMG *Â set)
Acquires a new image and adds it to a previously created image set. The scanning session should have been initialized with ScanInit
Return Value
If the function fails a nonzero error code is returned.
Parameters
set
A set of image of type SETOFIMG
Related Functions:Â ScanImg
ScanAutoBright
void ScanAutoBright(int automode)
Selects “Autobright” mode and lets the scanner determines an optimal brightness level. (Recommended)
Return Value
None
Parameters
automode
Mode:
nonzero value
Select the “autobright” mode
zero value
Unselect the “autobright” mode. In this case, you may select the brightness level by using ScanBrightness.
Related Functions:Â ScanBrightness
ScanAvailable
int ScanAvailable(void)
Detects if a scanner is connected to the computer
Return Value
A nonzero value is the scanner is available.
Comments
The connected scanner must be TWAIN compliant and the corresponding 32 bit TWAIN driver must be properly installed
ScanBrightness
void ScanBrightness(int brightness)
Changes scanning brightness
Return Value
None
Parameters
brightness
A value between -1000 (dark) and 1000 (light).
Related Functions:Â ScanAutoBright
ScanEnd
void ScanEnd(void)
Terminate a scanning section.
Return Value
None
Related Functions:Â ScanInit
ScanImg
IMG * ScanImg(void)
Acquires a new image. The scanning session should have been initialized with ScanInit.
Return Value
A pointer to the scanned image or NULL if the function fails.
Related Functions:Â ScanAndAddImage
ScanInit
int ScanInit(HWNDÂ hWnd)
Initializes the image acquisition process.
Return Value
If the function fails a nonzero error code is returned.
Parameters
hWnd
Your application main window handler.
Related Functions:Â ScanEnd
ScanResolution
void ScanResolution(int resolution)
Sets the scanning resolution. This function should be called before ScanInit.
Return Value
If the function fails a nonzero error code is returned.
Parameters
resolution
Scanning resolution in DPI. (default = 300 DPI)
Related Functions:Â ScanBrightness
ScanSelect
void ScanSelect(HWNDÂ hWnd)
Lets the user select a given scanner if several scanners are connected to the computer.
Parameters
hWnd
Your application main window handler.
ScanShowUI
void ScanShowUI(int mode)
Indicates if SimpleOCR should use the scanner user interface or not. This function should be called before ScanInit.
Return Value
If the function fails a nonzero error code is returned.
Parameters
mode
The selected mode.
nonzero
Use user interface (default).
zero
Don’t use the user interface.
SetLanguage
void SetLanguage(int language, const char* dictDir)
Selects the language used in the text you want to process.
Return Value
None
Parameters
language
A value among:
FRENCH for French language.
DUTCH for Dutch language.
ENGUK for UK English language.
CUSTOM for Custom dictionary.
NONE for No language selected.
dictDir
The directory where the dictionary files are stored.
Comments
If a language has been selected, the OCR process will use a dictionary in order to improve the OCR results.
SetOutputMode
void SetOutputMode(int mode)
    Selects the output mode for the OCR engine
Return Value
    None
Parameters
    mode
A value among:
    OT_TEXT
The engine will output only the text.
    OT_RICHTEXT
The engine will output the text and additional information like characters format, characters size, and font type.
    OT_WINDOW
The engine will output only the text directly in a window as it was typed on the keyboard.
SetOutputWindow
void SetOutputWindow(HWNDÂ hWnd)
When the output mode OT_WINDOW has been selected with the SetOutputMode function, this function allows you to indicate in which window the text will be sent.
Return Value
None
Parameters
hWnd
A window handler.
Related Functions:Â SetOutputMode
ShrinkImg
IMG * ShrinkImg(IMG * img, int nw, int nh)
Shrinks a bi-level image and returns the result in a grayscale image.
Return Value
A pointer to a new IMG object. If the function fails, the return value is NULL.
Parameters
Comments
The original image is left unchanged. If you don’t need it anymore, you have to free it by calling the FreeImg function. This function is mainly useful when you want to display a reduced bi-level image with a good display quality.
Example
Related Functions: ResizeImg, HalfSizeImg
-
- Introduction
- Code Samples
- SimpleOCR:SDK Functions
- Types
- TIFF File Manipulation
- Image Management
- OCR Functions
- Scanner Function
- Set of Images Management
Information in this document is subject to change without notice and said changes may not be reflected herein. ScanStore.com and its parent company, Meta Enterprises, LLC, may have patents or pending patents applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. The furnishing of this document does not grant you a license to these patents, trademarks, copyrights, or other intellectual property except as expressly provided in a written license agreement from Meta Enterprises, LLC.