Nicomsoft OCR Configuration

The following code initializes the OCR engine, creates OCR-related objects, loads the configuration from the "Config.dat" file, and loads and recognizes an image using the German language:

C++

int CfgObj, OcrObj, ImgObj, res, n;
wchar_t* txt;
Engine_InitializeAdvanced(&CfgObj, &OcrObj, &ImgObj); //initialize OCR engine, create objects and load configuration
res = Img_LoadFile(ImgObj, L"c:\\sample.bmp"); //load some image for OCR
if (res > ERROR_FIRST) {}; //insert error handler here
Cfg_SetOption(CfgObj, BT_DEFAULT, L"Languages/German", L"1"); //select German language
Cfg_SetOption(CfgObj, BT_DEFAULT, L"Languages/English", L"0"); //unselect English language
res = Img_OCR(ImgObj, OCRSTEP_FIRST, OCRSTEP_LAST, OCRFLAG_NONE); //perform OCR
if (res > ERROR_FIRST) {}; //insert error handler here
n = Img_GetImgText(ImgObj, NULL, 0, FMT_EXACTCOPY) + 1; //get buffer size plus terminating NULL character
txt = (wchar_t*) malloc(2 * n); //allocate memory for text
Img_GetImgText(ImgObj, txt, n, FMT_EXACTCOPY); //get text
Engine_Uninitialize(); //release all created objects and uninitialize OCR engine
//use "txt" variable now
free(txt); //free memory

//assume reference to NSOCR COM was added
using NSOCR_NameSpace; //Add NSOCR namespace from "NSOCR.cs" file
//...
int CfgObj, OcrObj, ImgObj, res;
string txt;
NSOCRLib.NSOCRClass NsOCR = new NSOCRLib.NSOCRClass(); //create NSOCR COM object instance
NsOCR.Engine_InitializeAdvanced(out CfgObj, out OcrObj, out ImgObj); //initialize OCR engine, create objects and load configuration
res = NsOCR.Img_LoadFile(ImgObj, "c:\\sample.bmp"); //load some image for OCR
if (res > TNSOCR.ERROR_FIRST) {}; //insert error handler here
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/German", "1"); //select German language
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/English", "0"); //unselect English language
res = NsOCR.Img_OCR(ImgObj, TNSOCR.OCRSTEP_FIRST, TNSOCR.OCRSTEP_LAST, TNSOCR.OCRFLAG_NONE); //perform OCR
if (res > TNSOCR.ERROR_FIRST) {}; //insert error handler here
NsOCR.Img_GetImgText(ImgObj, out txt, TNSOCR.FMT_EXACTCOPY); //get text
NsOCR.Engine_Uninitialize(); //release all created objects and uninitialize OCR engine

VB.NET

'assume reference to NSOCR COM was added
'assume "NSOCR.vb" file was added to project
Dim CfgObj, OcrObj, ImgObj, res As Integer
Dim txt As String = ""
Dim NsOCR As New NSOCRLib.NSOCRClass 'create NSOCR COM object instance
NsOCR.Engine_InitializeAdvanced(CfgObj, OcrObj, ImgObj) 'initialize OCR engine, create objects and load configuration
res = NsOCR.Img_LoadFile(ImgObj, "c:\sample.bmp") 'load some image for OCR
If res > TNSOCR.ERROR_FIRST Then 'insert error handler here
End If
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/German", "1") 'select German language
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/English", "0") 'unselect English language
res = NsOCR.Img_OCR(ImgObj, TNSOCR.OCRSTEP_FIRST, TNSOCR.OCRSTEP_LAST, TNSOCR.OCRFLAG_NONE) 'perform OCR
If res > TNSOCR.ERROR_FIRST Then 'insert error handler here
End If
NsOCR.Img_GetImgText(ImgObj, txt, TNSOCR.FMT_EXACTCOPY) 'get text
NsOCR.Engine_Uninitialize() 'release all created objects and uninitialize OCR engine

Java

//assume NSOCR package was included
//Java VM option "-Djava.library.path" must point to "Bin" folder (x86 platform) or to "Bin_64" folder (x64 platform)
//...
NSOCR.HCFG CfgObj = new NSOCR.HCFG();
NSOCR.HOCR OcrObj = new NSOCR.HOCR();
NSOCR.HIMG ImgObj = new NSOCR.HIMG();
int res;
StringBuffer txt = new StringBuffer();
NSOCR.Engine.Engine_InitializeAdvanced(CfgObj, OcrObj, ImgObj); //initialize OCR engine, create objects and load configuration
res = NSOCR.Engine.Img_LoadFile(ImgObj, "c:\\sample.bmp"); //load some image for OCR
if (res > NSOCR.Error.ERROR_FIRST) {}; //insert error handler here
NSOCR.Engine.Cfg_SetOption(CfgObj, NSOCR.Constant.BT_DEFAULT, "Languages/German", "1"); //select German language
NSOCR.Engine.Cfg_SetOption(CfgObj, NSOCR.Constant.BT_DEFAULT, "Languages/English", "0"); //unselect English language
res = NSOCR.Engine.Img_OCR(ImgObj, NSOCR.Constant.OCRSTEP_FIRST, NSOCR.Constant.OCRSTEP_LAST, NSOCR.Constant.OCRFLAG_NONE); //perform OCR
if (res > NSOCR.Error.ERROR_FIRST) {}; //insert error handler here
NSOCR.Engine.Img_GetImgText(ImgObj, txt, NSOCR.Constant.FMT_EXACTCOPY); //get text
System.out.println(txt.toString());
NSOCR.Engine.Engine_Uninitialize(); //release all created objects and uninitialize OCR engine

Option Name	Possible Values	Description
"MaxKernels"	"1" ... "16"	The maximum number of threads allowed in the Img_OCR and Ocr_ProcessPages functions. Note that if you call several functions at the same time from different threads, "MaxKernels" will be applied for every function individually. So if you process several images at once, it may be a good idea to use the value "1" for this option to keep the number of threads reasonable and achieve the best performance. The default value is "4".
"NumKernels"	"0" ... "16"	The number of threads allowed in the Ocr_OcrImg and Ocr_ProcessPages functions. The value "0" will enable OCR to use all logical CPUs detected by Windows. If the value of "NumKernels" is higher than that of "MaxKernels", the "MaxKernels" value will be used. The default value is "0".
"EnabledChars"	Characters	If specified, this option defines the characters allowed for recognition. For example, "0123456789" will allow to recognize digits only. The default value is "" – all characters are enabled.
"DisabledChars"	Characters	Characters disabled for recognition. For example, "AZ" will disable the letters "A" and "Z". The default value is "".
"CharFactors"	[chars factor]	For each character, OCR has one or more recognition variants. You can change the probability of any of these variants. For example, "[5 1.5]" will increase the probability of "5" for each recognized character by 1.5 times, which can be useful if OCR cannot recognize "5". "[MN 0.5][W 1.2]" will reduce the probability of "M" and "N" by 2 times and increase the probability of "W" by 1.2 times. The default value is "".
"GhostScriptDLL"	"" or valid path	The full path and filename of the GhostScript DLL (gsdll32.dll or gsdll64.dll). If empty, NSOCR will locate GhostScript automatically. The default value is "".
"PdfDPI"	"50" ... "600"	DPI setting for PDF file rendering with GhostScript. The default value is "300".
"PdfByExt"	"0", "1", "2"	"0" – detect PDF files by content. "1" – detect PDF files by file extension. "2" – assume that all files are PDF. The default value is "0".
"GrayMode"	"-1", "0", "1", "2"	"0" – load all images in the 24-bit color mode. "1" – load all images in the 8-bit grayscale mode, which requires less memory and is faster. "2" – load all images in 1-bit black-white mode, which requires even less memory and is faster. "-1" – use the 8-bit grayscale mode for grayscale and black-white images, and use the 24-bit color mode for color images. The default value is "-1".
"FastMode"	"0", "1", "2"	"0" – prefer the best recognition quality. "1" – prefer the maximum recognition speed. "2" – use the superfast recognition mode. The default value is "0". Note that this setting is related to the OCRSTEP_OCR step only. To speed up the entire OCR process, you need to use other settings. Please see the Performance section for details.
"TempDir"	"" or valid path	The path to the folder for temporary files, for example, "c:\temp". If not specified, the default folder will be used (CSIDL_LOCAL_APPDATA). The temporary folder is used only when using GhostScript to open a PDF from the memory stream. In most cases you don’t need to change the default value. Sometimes you need to do it if you use OCR in IIS, or if custom security settings are used for the file system. The default value is "".
"Timeout"	"0"..."100000"	The timeout for the OCR operation, in seconds. It is applied for the Img_OCR function or for every page when the Ocr_ProcessPages function is called. If an image cannot be processed within the specified time, these functions will return the ERROR_OPERATIONTIMEOUT value. If no timeout is required, specify zero. The default value is "300".

Option Name	Possible Values	Description
"Inversion"	"0", "1", "2"	"0" – do not invert the original image. "1" – invert the original image. "2" – detect inversion and automatically invert the image if necessary. The default value is "2".
"Rotation"	"0", "1", "2", "3"	"0" – do not rotate the original image. "1" – rotate the image 90° clockwise. "2" – rotate the image 180° clockwise. "3" – rotate the image 270° clockwise. The default value is "0".
"MirrorH"	"0", "1"	"0" – do not mirror the original image. "1" – mirror the original image horizontally. The default value is "0".
"MirrorV"	"0", "1"	"0" – do not mirror the original image. "1" – mirror the original image vertically. The default value is "0".
"SkewAngle"	"-90.0"..."90.0", "360"	The angle, in degrees, that will be used to rotate the image. "360" – detect the angle automatically. "0" – do not rotate the image. The default value is "360".
"SkewRange"	"0"..."45.0"	The angle range, in degrees, to be used while detecting the skew angle. Works only if the "SkewAngle" option is "360". For instance, if "10" is specified, the angles -10 ... 10 degrees will be checked. The default value is "20".
"IgnoredAngle"	"0"..."45.0"	The maximum ignored angle during autorotation. For example, if "2.0" is specified, the image will not be rotated if the detected skew angle is less than 2.0 degrees. This option is used to improve the processing speed when small skew angles are not important. The default value is "0.5".
"SkewStep"	"0.01"..."1.0"	The step, in degrees, for finding the skew angle during autorotation. Large values increase the autorotation speed but reduce the accuracy of the skew angle, which may cause incorrect results. Small values decrease the autorotation speed but increase the quality. The default value is "0.2".
"ScaleFactor"	"0.1"..."4.0"	The image scaling factor. For example, if "2" is specified, the image size will be doubled; if "0.5" is specified, the image size will be halved. For the best recognition quality, the average character height should be about 25 pixels. The default value is "1.0".
"AutoScale"	"0", "1"	"0" – do not scale the image automatically. "1" – scale the image automatically for the best recognition quality. If OCR cannot detect the scale, the "ScaleFactor" option value will be used. The default value is "1".
"AutoRotate"	"0", "1", "2"	"0" – do nothing. "1" – detect if the image has been rotated incorrectly (90/180/270 degrees) and rotate it back. "2" – detect only the 180 degrees rotation. The default value is "1".
"NoiseFilter"	"0", "1"	"0" – do not remove background noise. "1" – remove background noise. This option is very useful for images with dotted background. The default value is "1".
"Blur"	"0", "3", "5", "7"	"0" – do not blur the image. "3", "5", "7" – apply blur with a window of 3, 5, or 7 pixels, respectively. The default value is "0".
"CropLeft"	"0"..."10000"	The leftmost position of the cropped rectangle, in pixels. The default value is "0".
"CropTop"	"0"..."10000"	The top position of the cropped rectangle, in pixels. The default value is "0".
"CropRight"	"0"..."10000"	The rightmost position of the cropped rectangle, in pixels. The default value is "0".
"CropBottom"	"0"..."10000"	The bottom position of the cropped rectangle, in pixels. The default value is "0".
"BackgroundColor"	"0x000000"..."0xFFFFFF", "0xFFFFFFFF"	Defines the background color for images after deskewing as an RGB value. For example, "0x00FF00" means green background, and "0xFFFFFFFF" means the average image color. The default value is "0xFFFFFFFF".
"AspectRatio"	"0", "0.2"..."5.0"	Defines the aspect ratio factor for images. "0" – use the aspect ratio factor 1.0, but change the aspect ratio if the image has different DPI_X and DPI_Y values. The default value is "0".

Option Name	Possible Values	Description
"SimpleThr"	"0" ... "255"	"0"..."254" – use simple binarization with the specified threshold. "255" – use intellectual adaptive binarization. The default value is "255".
"LightFactor"	"0.0"..."1.0"	The light factor for adusting the final threshold during intellectual adaptive binarization. Higher values mean a darker binarized image. The default value is "0.4".
"LightFactorLines"	"0.0"..."1.0"	The light factor for adaptive binarization when searching for lines. The default value is "0.8".
"BinBlocks"	"0", "1"	"0" – binarize the entire image. "1" – binarize only the image areas assigned to the Block objects. The default value is "0".
"BinTwice"	"0", "1"	"1" – rebinarize image areas assigned to the Block objects at the OCRSTEP_OCR step. "0" – skip additional binarization. The default value is "0".
"BinSmooth"	"0", "1", "2"	Applies the "dilate" algorithm to the binarized image. "0" – do not apply. "1" – apply automatically when necessary. "2" – always apply. The default value is "1".
"SmartBinZones"	"0", "1"	Applies a special algorithm that improves binarization of complex images that contain zones with different background colors and inverted zones. "0" – do not apply. "1" – apply when necessary. The default value is "1".
"BinNoiseFilter"	"0", "1", "2"..."12"	Applies a noise removal algorithm on the binarized image. "0" – do not apply. "1" – apply with default noise removal level. "2"..."12" - apply with specified noise removal level. The default value is "0".

Option Name	Possible Values	Description
"FindHorLines"	"0", "1"	"1" – find horizontal lines in the image, "0" – skip this step. The default value is "1".
"FindVerLines"	"0", "1"	"1" – find vertical lines in the image, "0" – skip this step. The default value is "1".
"FindHorFrames"	"0", "1"	"1" – find horizontal frames in the image, "0" – skip this step. The default value is "1".
"FindVerFrames"	"0", "1"	"1" – find vertical frames in the image, "0" – skip this step. The default value is "1".
"RemoveLines"	"0", "1"	"1" – remove lines from the image. "0" – skip this step. The default value is "1".
"AngleRange"	"0"..."45.0"	The angle range, in degrees, that is used while detecting lines. The default value is "3.0".
"MinLineLength"	"1"..."10000"	The minimum required line length, in pixels. The default value is "150".
"MaxLineWidth"	"1"..."100"	The maximum line width, in pixels. The default value is "20".
"MaxGap"	"0"..."100"	The maximum gap length in a line, in pixels. The default value is "8".
"FillFactor"	"0.1"..."1.0"	The advanced black/white pixels factor. The default value is "0.9", which means that a line must contain at least 90 percent of black pixels, and no more than 10 percent of gaps.
"MinPieceLen"	"1"..."10000"	The minimum required length of the largest solid line piece. The default value is "40".
"FindUnderlines"	"0", "1"	"1" – find and remove horizontal lines below the text during the OCRSTEP_OCR step to handle underlined words properly, "0" – skip this step. The default value is "1".
"MinLenFactor"	"0.5"..."100.0"	This option is effective only if the "FindUnderlines" option is set to "1". The factor that is used to calculate the minimum length of lines below the text (underlined text). The default value is "2.5".

Option Name	Possible Values	Description
"FindBarcodes"	"0", "1", "2"	"0" – do not find barcode zones. "1" – find barcode zones at the OCRSTEP_ZONING step if auto-zoning is executed. "2" – find barcodes only, do not find text zones. The default value is "1".
"DetectInversion"	"0", "1"	"0" – do nothing. "1" – detect inversion of zones. The default value is "1".
"DetectRotation"	"0", "1"	"0" – do nothing. "1" – detect rotation of zones at the OCRSTEP_ZONING step if auto-zoning is executed. The default value is "0".
"FindTables"	"0", "1"	"0" – do not find table zones. "1" – find table zones at the OCRSTEP_ZONING step if auto-zoning is executed. The default value is "1".
"OneZone"	"0", "1"..."9"	"0" – find zones automatically at the OCRSTEP_ZONING step. "1"..."9" – only one zone is defined at the OCRSTEP_ZONING step, and that zone covers the entire image. The value defines the zone type, see the BT_XXXXX constants for possible values. For example, "1" means "BT_OCRTEXT", "9" means "BT_MRZ". The default value is "0".
"MoreZones"	"0", "1", "2"	"0" – the normal mode for zones detection. "1" – we want more zones in the image. "2" – we want even more zones. The default value is "0".
"ZonesFactor"	"0.1"..."10.0"	The factor that defines how close zones can be combined. A larger value means that more zones will be combined into one zone. The default value is "1.0".
"OneColumn"	"0", "1"	"0" – the image can contain more than one column. "1" – assume that the image can contain only one column. The default value is "0".