Nicomsoft OCR: Developer's Guide


OCR Configuration

Configuring OCR for your needs is a very important step. Different tasks may require different OCR settings, which is why sometimes you need to configure the OCR engine properly to get the best results.

NSOCR has the following functions to manage OCR options: Cfg_LoadOptions, Cfg_SaveOptions, Cfg_GetOption, and Cfg_SetOption.
The options are stored in the XML format, by default in the "Config.dat" file in "Bin_common" folder. That file uses the Unicode text format and can be directly edited with Notepad. Every option has a unique path, for example, "Main/MultiThread". In this case, "Main" is the section’s name, and "MultiThread" is the option’s name.

Note:
           You can edit the configuration file directly with Notepad or any other text editor that supports Unicode text files. Please note that the XML format does not allow you to directly specify some characters. Make sure you fill the values for such options as "Main/EnabledChars" and "Main/DisabledChars". We recommend editing the configuration file as follows:
You can change the loaded OCR options at runtime with the Cfg_SetOption function. The new options will be used at the next call of any OCR function. Note that the new options will not be applied to the already-done OCR steps. If you have recognized an image and then changed some OCR options, you will need to recognize the image again with the new options. Moreover, because the OCRSTEP_PREFILTERS step can be executed only once, you will need to reload the image if you have changed the options that are applied at that step.

Here is a detailed description of the supported options by sections:

The "Main" section:
Option Name Possible Values Description
"MaxKernels" "1" ... "16" The maximum number of threads allowed in the Img_OCR and Ocr_ProcessPages functions. Note that if you call several functions at the same time from different threads, "MaxKernels" will be applied for every function individually. So if you process several images at once, it may be a good idea to use the value "1" for this option to keep the number of threads reasonable and achieve the best performance. The default value is "4".
"NumKernels" "0" ... "16" The number of threads allowed in the Ocr_OcrImg and Ocr_ProcessPages functions. The value "0" will enable OCR to use all logical CPUs detected by Windows. If the value of "NumKernels" is higher than that of "MaxKernels", the "MaxKernels" value will be used. The default value is "0".
"EnabledChars" Characters If specified, this option defines the characters allowed for recognition. For example, "0123456789" will allow to recognize digits only. The default value is "" – all characters are enabled.
"DisabledChars" Characters Characters disabled for recognition. For example, "AZ" will disable the letters "A" and "Z". The default value is "".
"CharFactors" [chars factor] For each character, OCR has one or more recognition variants. You can change the probability of any of these variants. For example, "[5 1.5]" will increase the probability of "5" for each recognized character by 1.5 times, which can be useful if OCR cannot recognize "5". "[MN 0.5][W 1.2]" will reduce the probability of "M" and "N" by 2 times and increase the probability of "W" by 1.2 times. The default value is "".
"GhostScriptDLL" "" or valid path The full path and filename of the GhostScript DLL (gsdll32.dll or gsdll64.dll). If empty, NSOCR will locate GhostScript automatically. The default value is "".
"PdfDPI" "50" ... "600" DPI setting for PDF file rendering with GhostScript. The default value is "300".
"PdfByExt" "0", "1", "2" "0" – detect PDF files by content. "1" – detect PDF files by file extension. "2" – assume that all files are PDF. The default value is "0".
"GrayMode" "-1", "0", "1", "2" "0" – load all images in the 24-bit color mode. "1" – load all images in the 8-bit grayscale mode, which requires less memory and is faster. "2" – load all images in 1-bit black-white mode, which requires even less memory and is faster. "-1" – use the 8-bit grayscale mode for grayscale and black-white images, and use the 24-bit color mode for color images. The default value is "-1".
"FastMode" "0", "1", "2" "0" – prefer the best recognition quality. "1" – prefer the maximum recognition speed. "2" – use the superfast recognition mode. The default value is "0". Note that this setting is related to the OCRSTEP_OCR step only. To speed up the entire OCR process, you need to use other settings. Please see the Performance section for details.
"TempDir" "" or valid path The path to the folder for temporary files, for example, "c:\temp". If not specified, the default folder will be used (CSIDL_LOCAL_APPDATA). The temporary folder is used only when using GhostScript to open a PDF from the memory stream. In most cases you don’t need to change the default value. Sometimes you need to do it if you use OCR in IIS, or if custom security settings are used for the file system. The default value is "".
"Timeout" "0"..."100000" The timeout for the OCR operation, in seconds. It is applied for the Img_OCR function or for every page when the Ocr_ProcessPages function is called. If an image cannot be processed within the specified time, these functions will return the ERROR_OPERATIONTIMEOUT value. If no timeout is required, specify zero. The default value is "300".


The "ImgAlizer" section (the OCRSTEP_PREFILTERS step):
Option Name Possible Values Description
"Inversion" "0", "1", "2" "0" – do not invert the original image. "1" – invert the original image. "2" – detect inversion and automatically invert the image if necessary. The default value is "2".
"Rotation" "0", "1", "2", "3" "0" – do not rotate the original image. "1" – rotate the image 90° clockwise. "2" – rotate the image 180° clockwise. "3" – rotate the image 270° clockwise. The default value is "0".
"MirrorH" "0", "1" "0" – do not mirror the original image. "1" – mirror the original image horizontally. The default value is "0".
"MirrorV" "0", "1" "0" – do not mirror the original image. "1" – mirror the original image vertically. The default value is "0".
"SkewAngle" "-90.0"..."90.0", "360" The angle, in degrees, that will be used to rotate the image. "360" – detect the angle automatically. "0" – do not rotate the image. The default value is "360".
"SkewRange" "0"..."45.0" The angle range, in degrees, to be used while detecting the skew angle. Works only if the "SkewAngle" option is "360". For instance, if "10" is specified, the angles -10 ... 10 degrees will be checked. The default value is "20".
"IgnoredAngle" "0"..."45.0" The maximum ignored angle during autorotation. For example, if "2.0" is specified, the image will not be rotated if the detected skew angle is less than 2.0 degrees. This option is used to improve the processing speed when small skew angles are not important. The default value is "0.5".
"SkewStep" "0.01"..."1.0" The step, in degrees, for finding the skew angle during autorotation. Large values increase the autorotation speed but reduce the accuracy of the skew angle, which may cause incorrect results. Small values decrease the autorotation speed but increase the quality. The default value is "0.2".
"ScaleFactor" "0.1"..."4.0" The image scaling factor. For example, if "2" is specified, the image size will be doubled; if "0.5" is specified, the image size will be halved. For the best recognition quality, the average character height should be about 25 pixels. The default value is "1.0".
"AutoScale" "0", "1" "0" – do not scale the image automatically. "1" – scale the image automatically for the best recognition quality. If OCR cannot detect the scale, the "ScaleFactor" option value will be used. The default value is "1".
"AutoRotate" "0", "1", "2" "0" – do nothing. "1" – detect if the image has been rotated incorrectly (90/180/270 degrees) and rotate it back. "2" – detect only the 180 degrees rotation. The default value is "1".
"NoiseFilter" "0", "1" "0" – do not remove background noise. "1" – remove background noise. This option is very useful for images with dotted background. The default value is "1".
"Blur" "0", "3", "5", "7" "0" – do not blur the image. "3", "5", "7" – apply blur with a window of 3, 5, or 7 pixels, respectively. The default value is "0".
"CropLeft" "0"..."10000" The leftmost position of the cropped rectangle, in pixels. The default value is "0".
"CropTop" "0"..."10000" The top position of the cropped rectangle, in pixels. The default value is "0".
"CropRight" "0"..."10000" The rightmost position of the cropped rectangle, in pixels. The default value is "0".
"CropBottom" "0"..."10000" The bottom position of the cropped rectangle, in pixels. The default value is "0".
"BackgroundColor" "0x000000"..."0xFFFFFF", "0xFFFFFFFF" Defines the background color for images after deskewing as an RGB value. For example, "0x00FF00" means green background, and "0xFFFFFFFF" means the average image color. The default value is "0xFFFFFFFF".
"AspectRatio" "0", "0.2"..."5.0" Defines the aspect ratio factor for images. "0" – use the aspect ratio factor 1.0, but change the aspect ratio if the image has different DPI_X and DPI_Y values. The default value is "0".


The "Binarizer" section (the OCRSTEP_BINARIZE step):
Option Name Possible Values Description
"SimpleThr" "0" ... "255" "0"..."254" – use simple binarization with the specified threshold. "255" – use intellectual adaptive binarization. The default value is "255".
"LightFactor" "0.0"..."1.0" The light factor for adusting the final threshold during intellectual adaptive binarization. Higher values mean a darker binarized image. The default value is "0.4".
"LightFactorLines" "0.0"..."1.0" The light factor for adaptive binarization when searching for lines. The default value is "0.8".
"BinBlocks" "0", "1" "0" – binarize the entire image. "1" – binarize only the image areas assigned to the Block objects. The default value is "0".
"BinTwice" "0", "1" "1" – rebinarize image areas assigned to the Block objects at the OCRSTEP_OCR step. "0" – skip additional binarization. The default value is "0".
"BinSmooth" "0", "1", "2" Applies the "dilate" algorithm to the binarized image. "0" – do not apply. "1" – apply automatically when necessary. "2" – always apply. The default value is "1".
"SmartBinZones" "0", "1" Applies a special algorithm that improves binarization of complex images that contain zones with different background colors and inverted zones. "0" – do not apply. "1" – apply when necessary. The default value is "1".
"BinNoiseFilter" "0", "1", "2"..."12" Applies a noise removal algorithm on the binarized image. "0" – do not apply. "1" – apply with default noise removal level. "2"..."12" - apply with specified noise removal level. The default value is "0".


The "PixLines" section (the OCRSTEP_REMOVELINES step):
Option Name Possible Values Description
"FindHorLines" "0", "1" "1" – find horizontal lines in the image, "0" – skip this step. The default value is "1".
"FindVerLines" "0", "1" "1" – find vertical lines in the image, "0" – skip this step. The default value is "1".
"FindHorFrames" "0", "1" "1" – find horizontal frames in the image, "0" – skip this step. The default value is "1".
"FindVerFrames" "0", "1" "1" – find vertical frames in the image, "0" – skip this step. The default value is "1".
"RemoveLines" "0", "1" "1" – remove lines from the image. "0" – skip this step. The default value is "1".
"AngleRange" "0"..."45.0" The angle range, in degrees, that is used while detecting lines. The default value is "3.0".
"MinLineLength" "1"..."10000" The minimum required line length, in pixels. The default value is "150".
"MaxLineWidth" "1"..."100" The maximum line width, in pixels. The default value is "20".
"MaxGap" "0"..."100" The maximum gap length in a line, in pixels. The default value is "8".
"FillFactor" "0.1"..."1.0" The advanced black/white pixels factor. The default value is "0.9", which means that a line must contain at least 90 percent of black pixels, and no more than 10 percent of gaps.
"MinPieceLen" "1"..."10000" The minimum required length of the largest solid line piece. The default value is "40".
"FindUnderlines" "0", "1" "1" – find and remove horizontal lines below the text during the OCRSTEP_OCR step to handle underlined words properly, "0" – skip this step. The default value is "1".
"MinLenFactor" "0.5"..."100.0" This option is effective only if the "FindUnderlines" option is set to "1". The factor that is used to calculate the minimum length of lines below the text (underlined text). The default value is "2.5".


The "Zoning" section (the OCRSTEP_ZONING step):
Option Name Possible Values Description
"FindBarcodes" "0", "1", "2" "0" – do not find barcode zones. "1" – find barcode zones at the OCRSTEP_ZONING step if auto-zoning is executed. "2" – find barcodes only, do not find text zones. The default value is "1".
"DetectInversion" "0", "1" "0" – do nothing. "1" – detect inversion of zones. The default value is "1".
"DetectRotation" "0", "1" "0" – do nothing. "1" – detect rotation of zones at the OCRSTEP_ZONING step if auto-zoning is executed. The default value is "0".
"FindTables" "0", "1" "0" – do not find table zones. "1" – find table zones at the OCRSTEP_ZONING step if auto-zoning is executed. The default value is "1".
"OneZone" "0", "1"..."9" "0" – find zones automatically at the OCRSTEP_ZONING step. "1"..."9" – only one zone is defined at the OCRSTEP_ZONING step, and that zone covers the entire image. The value defines the zone type, see the BT_XXXXX constants for possible values. For example, "1" means "BT_OCRTEXT", "9" means "BT_MRZ". The default value is "0".
"MoreZones" "0", "1", "2" "0" – the normal mode for zones detection. "1" – we want more zones in the image. "2" – we want even more zones. The default value is "0".
"ZonesFactor" "0.1"..."10.0" The factor that defines how close zones can be combined. A larger value means that more zones will be combined into one zone. The default value is "1.0".
"OneColumn" "0", "1" "0" – the image can contain more than one column. "1" – assume that the image can contain only one column. The default value is "0".


The "Linezer" section:
Option Name Possible Values Description
"RemoveGarbage" "0", "1" "1" – apply a smart algorithm to clean a "messy" image. "0" – do not remove garbage. The default value is "1".
"DetectArea" "0", "1" "1" – detect the outer text area of the block (useful for scanned images with black borders). "0" – do not detect. The default value is "1".
"FindBorders" "0", "1", "2" "0" – do nothing. "1" – find white borders and adjust the block size. "2" – find black borders and adjust the block size.
"BigGarbageMinWidth" "0"..."10000" "0" – do nothing. "1"..."10000" – remove big garbage of the specified minimum width, in pixels. The default value is "0".
"BigGarbageMinHeight" "0"..."10000" "0" – do nothing. "1"..."10000" – remove big garbage of the specified minimum height, in pixels. The default value is "0".
"SmallGarbageMaxPixCnt" "0"..."10000" "0" – do nothing. "1"..."10000" – remove small garbage that has the specified maximum number of pixels. The default value is "1".
"SmallGarbageMaxWidth" "0"..."100" "0" – do nothing. "1"..."100" – remove small garbage of the specified maximum width, in pixels. The default value is "1".
"SmallGarbageMaxHeight" "0"..."100" "0" – do nothing. "1"..."100" – remove small garbage of the specified maximum height, in pixels. The default value is "1".
"SmallGarbageMaxLen" "0"..."100" "0" – do nothing. "1"..."100" – small garbage will be removed only if its width or height is not larger than the specified value, in pixels. The default value is "3".
"JoinMaxDistance" "0".."100" "0" – do nothing. "1"..."100" – join the pieces of a broken character if the distance between them is less than the specified value. The default value is "0".
"AlwaysJoinCross" "0", "1" "0" – use a smart algorithm to join only some of the pieces that are crossed. "1" – always combine pieces that are crossed. The default value is "0".
"FilterLines" "0", "1" "0" – do nothing. "1" – filter out garbage lines.
"SkipZoneThr" "0"..."200000" "0" – ignore this option. "1"..."200000" – skip recognition of the text zone if it contains more than the specified number of nonconnected black areas. This option is useful if some of the input images are pure garbage, so that OCR should skip them instead of trying to process them (processing such images may take too much time). The default value is "20000".


The "Spacer" section:
Option Name Possible Values Description
"SpaceFactor" "0.1", "10.0" The factor of space between words. The default value is "1.0".
"SimpleSpace" "0", "1000" "0" – use adaptive space detection. "1"..."1000" – directly specify the space size, in pixels. The default value is "0".


The "WordAlizer" section:
Option Name Possible Values Description
"SplitCombine" "0", "1" "1" – split and combine broken characters when needed. "0" – do not split or combine characters. The default value is "1".
"UseCaser" "0", "1" "1" – use a smart algorithm to detect the characters’ case. "0" – do not adjust the characters’ case. The default value is "1".
"TextQual" "-1"..."100" The text quality value is used in the dictionary and characters-selection algorithms.
"-1" – detect text quality automatically. "0".."100" – text quality as a percentage. A larger value means less dictionary weight, and vice versa. The default value is "-1".
"TabDistance" "-1"..."100" The distance between words, in character widths, when the Tab character is inserted instead of the space character. The default value is "-1" (automatically).
"TabOverLine" "0", "1" "0" – do nothing. "1" – insert the Tab character between words instead of the space character if a line is detected between words, which can be useful for tables. The default value is "1".
"MultipleCR" "0", "1" "0" – do nothing. "1" – calculate the average distance between text lines and insert additional empty lines when necessary (can be useful for table columns recognition when some cells are empty). The default value is "0".
"CorrectMixed" "0", "1" "0" – do not correct the text. "1" – try to correct words that have both letters and digits; for example, "2I5" can be changed to "215", and "L1R" can be changed to "LIR" when OCR is in doubts. You can also specify an intermediate value, such as "0.5". The default value is "1".
"CommaUp" "-0.5", "0.5" Useful for printers that cannot print below a text line and have to print commas above it. In such cases, you can specify a value of 0.05 ... 0.15 to improve the recognition quality. The default value is "0.0".
"HRangeFactor" "0.1", "10.0" Useful for fonts that use different character heights for the same font size. For example, all characters in the word "0123ABC" usually have the same height, and OCR uses that fact to achieve better recognition. But in some fonts, digits can be higher than letters, or vice versa, which may cause poor recognition results. In such cases, specify a value of 2.0 ... 3.0 to improve recognition. The default value is "1.0".
"ZeroHelp" "-0.5", "0.5" Specify a small non-zero value if some "0" characters are recognized as "O" (or vice versa). For example, "0.1" helps OCR to find more "0" characters, and "-0.1" results in more "O" characters when OCR is in doubts. The default value is "0.0".
"GarbageThr" "0"..."100" The garbage threshold for words that were recognized with poor quality, in percents. For example "30" means that OCR removes all words that have confidence quality below 30%. "0" - do not remove any words. The default value is "25".


The "Languages" section:
Option Name Possible Values Description
"English" "0", "1" "0" – disable English language support. "1" – enable English language support. The default value is "1".
"German" "0", "1" "0" – disable German language support. "1" – enable German language support. The default value is "0".
"French" "0", "1" "0" – disable French language support. "1" – enable French language support. The default value is "0".
"Spanish" "0", "1" "0" – disable Spanish language support. "1" – enable Spanish language support. The default value is "0".
"Russian" "0", "1" "0" – disable Russian language support. "1" – enable Russian language support. The default value is "0".
"Italian" "0", "1" "0" – disable Italian language support. "1" – enable Italian language support. The default value is "0".
"Portuguese" "0", "1" "0" – disable Portuguese language support. "1" – enable Portuguese language support. The default value is "0".
"Dutch" "0", "1" "0" – disable Dutch language support. "1" – enable Dutch language support. The default value is "0".
"Finnish" "0", "1" "0" – disable Finnish language support. "1" – enable Finnish language support. The default value is "0".
"Catalan" "0", "1" "0" – disable Catalan language support. "1" – enable Catalan language support. The default value is "0".
"Indonesian" "0", "1" "0" – disable Indonesian language support. "1" – enable Indonesian language support. The default value is "0".
"Swedish" "0", "1" "0" – disable Swedish language support. "1" – enable Swedish language support. The default value is "0".
"Turkish" "0", "1" "0" – disable Turkish language support. "1" – enable Turkish language support. The default value is "0".
"Romanian" "0", "1" "0" – disable Romanian language support. "1" – enable Romanian language support. The default value is "0".
"Danish" "0", "1" "0" – disable Danish language support. "1" – enable Danish language support. The default value is "0".
"Norwegian" "0", "1" "0" – disable Norwegian language support. "1" – enable Norwegian language support. The default value is "0".
"Polish" "0", "1" "0" – disable Polish language support. "1" – enable Polish language support. The default value is "0".
"Hungarian" "0", "1" "0" – disable Hungarian language support. "1" – enable Hungarian language support. The default value is "0".
"Estonian" "0", "1" "0" – disable Estonian language support. "1" – enable Estonian language support. The default value is "0".
"Slovenian" "0", "1" "0" – disable Slovenian language support. "1" – enable Slovenian language support. The default value is "0".
"Croatian" "0", "1" "0" – disable Croatian language support. "1" – enable Croatian language support. The default value is "0".
"Czech" "0", "1" "0" – disable Czech language support. "1" – enable Czech language support. The default value is "0".
"Slovak" "0", "1" "0" – disable Slovak language support. "1" – enable Slovak language support. The default value is "0".
"Lithuanian" "0", "1" "0" – disable Lithuanian language support. "1" – enable Lithuanian language support. The default value is "0".
"Latvian" "0", "1" "0" – disable Latvian language support. "1" – enable Latvian language support. The default value is "0".
"Bulgarian" "0", "1" "0" – disable Bulgarian language support. "1" – enable Bulgarian language support. The default value is "0".
"Chinese_Simplified" "0", "1" "0" – disable Chinese-simplified language support. "1" – enable Chinese-simplified language support. The default value is "0".
"Chinese_Traditional" "0", "1" "0" – disable Chinese-traditional language support. "1" – enable Chinese-traditional language support. The default value is "0".
"Arabic" "0", "1" "0" – disable Arabic language support. "1" – enable Arabic language support. The default value is "0".
"Korean" "0", "1" "0" – disable Korean language support. "1" – enable Korean language support. The default value is "0".
"Japanese" "0", "1" "0" – disable Japanese language support. "1" – enable Japanese language support. The default value is "0".
Notes: At least one language must be enabled. You can select more than one language for recognition. You can also specify different languages for different text blocks. Do not enable multiple languages if you need to recognize only one, because it may decrease the recognition quality and the process may consume more operating memory and CPU time.


The "Dictionaries" section:
See the Dictionaries section for more information about dictionaries.
Option Name Possible Values Description
"UseDictionary" "0", "1" "0" – do not use the built-in dictionary, "1" – use the built-in dictionary to improve recognition. The default value is "1".
"ExternalDLL" "" or valid filename The filename of the DLL of the external dictionary. If empty, NSOCR will use the built-in dictionaries. The default value is "".
"ExternalFlags" "0", "1" Effective only if the external dictionary is enabled: "0" – advanced processing, "1" – just pass all words to the external dictionary. The default value is "0".
"UseUserDictionary" "0", "1" "0" – do not use the user dictionary, "1" – use the user dictionary to improve recognition. The default value is "0".
"UserDictionary" valid filename The filename of the user dictionary. The default value is "UserDictionary.txt".
"English" valid filename The filename of the English dictionary. The default value is "EN.lng".
"German" valid filename The filename of the German dictionary. The default value is "DE.lng".
"French" valid filename The filename of the French dictionary. The default value is "FR.lng".
"Spanish" valid filename The filename of the Spanish dictionary. The default value is "ES.lng".
"Russian" valid filename The filename of the Russian dictionary. The default value is "RU.lng".
"Italian" valid filename The filename of the Italian dictionary. The default value is "IT.lng".
"Portuguese" valid filename The filename of the Portuguese dictionary. The default value is "PT.lng".
"Dutch" valid filename The filename of the Dutch dictionary. The default value is "NL.lng".
"Finnish" valid filename The filename of the Finnish dictionary. The default value is "" (no dictionary).
"Catalan" valid filename The filename of the Catalan dictionary. The default value is "CA.lng".
"Indonesian" valid filename The filename of the Indonesian dictionary. The default value is "ID.lng".
"Swedish" valid filename The filename of the Swedish dictionary. The default value is "SV.lng".
"Turkish" valid filename The filename of the Turkish dictionary. The default value is "" (no dictionary).
"Romanian" valid filename The filename of the Romanian dictionary. The default value is "RO.lng".
"Danish" valid filename The filename of the Danish dictionary. The default value is "DA.lng".
"Norwegian" valid filename The filename of the Norwegian dictionary. The default value is "NB.lng".
"Polish" valid filename The filename of the Polish dictionary. The default value is "PL.lng".
"Hungarian" valid filename The filename of the Hungarian dictionary. The default value is "HU.lng".
"Estonian" valid filename The filename of the Estonian dictionary. The default value is "ET.lng".
"Slovenian" valid filename The filename of the Slovenian dictionary. The default value is "SL.lng".
"Croatian" valid filename The filename of the Croatian dictionary. The default value is "HR.lng".
"Czech" valid filename The filename of the Czech dictionary. The default value is "CS.lng".
"Slovak" valid filename The filename of the Slovak dictionary. The default value is "SK.lng".
"Lithuanian" valid filename The filename of the Lithuanian dictionary. The default value is "LT.lng".
"Latvian" valid filename The filename of the Latvian dictionary. The default value is "LV.lng".
"Bulgarian" valid filename The filename of the Bulgarian dictionary. The default value is "BG.lng".


The "Saver" section:
Option Name Possible Values Description
"PDF/PageSize" "0.01"..."100" The PDF page size factor. The default value is "1.0". You can also specify the page size directly in the "WIDTHxHEIGHT" format. For example, "720x360" means 720x360 pixels, or 10x5 inches (the PDF resolution is 72 dpi).
"PDF/ImageDPI" "-1", "0", "1"..."600" The DPI value for PDF images. "-1" – skip images, "0" – use the original image resolution. The default value is "0".
"PDF/ImageQual" "1"..."100" The PDF image quality. The default value is "85".
"PDF/GrayImageBPP" "1", "2", "4", "8" The maximum bits-per-pixel value for grayscale images in the output PDF. This option is applied to grayscale images only. Use the "Main/GrayMode" option to convert color images to grayscale ones. This option can reduce the size of the output PDF file. The default value is "4".
"PDF/BaseFontName" PDF Base14 font name One of the 14 built-in PDF fonts: Courie, Courier-Bold, Courier-Oblique, Courier-BoldOblique, Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique, Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic, Symbol, and ZapfDingbats. The default value is "Helvetica".
"PDF/ImageLayer" "0", "1" "0" – save the recognized text and images. "1" – save the recognized text, and then place the entire image of the page over the text; in this mode, the PDF file will keep all visual information from the original document, and will be searchable, too. The default value is "0".
"PDF/ExternalFont" File Name The TrueType (TTF) font file that will be embedded into the PDF file if a non-English language is selected for recognition. The default value is "pdf_font.ttf".
"PDF/PDFA_Type" "1a", "1b" The PDF/A type is used when SVR_FORMAT_PDFA is selected for the Saver object. The "1a" value means the PDF/A-1a format, and "1b" means the PDF/A-1b format. The default value is "1a".

Option Name Possible Values Description
"RTF/PageSize" "0.01"..."100" The RTF page size factor. The default value is "1.0". You can also specify the page size directly in the "WIDTHxHEIGHT" format. For example, "720x360" means 720x360 pixels or 10x5 inches (the RTF resolution is 72 dpi).
"RTF/ImageDPI" "-1", "0", "1"..."600" The DPI value for RTF images. "-1" – skip images, "0" – use the original image resolution. The default value is "0".
"RTF/ImageQual" "1"..."100" The RTF image quality. The default value is "90".
"RTF/FontName" font name The font name that will be used in RTF. The default value is "Times New Roman".


The "Scan" section:
Option Name Possible Values Description
"ColorMode" "0", "1", "2" The color mode: "0" – Black/White; "1" – Grayscale 256 colors; "2" – RGB. The default vaue is "1".
"Resolution" DPI value The scanning resolution, in DPI. The default value is "300".
"AdfPagesCount" integer For scanners with an ADF (Automatic Document Feeder) only: the number of pages to scan from the ADF (if supported). The default value is "0" – scan all pages loaded into the ADF.
"DuplexMode" "0", "1", "2" The duplex mode for scanners that can scan both sides of a sheet of paper. "0" – default mode, "1" – scan only one side, "2" – scan both sides. The default value is "0".
"AreaLeft" "0.0"..."100.0" The leftmost position of the scanning area, in inches. The default value is "0.0". This option is supported by TWAIN devices only.
"AreaTop" "0.0"..."100.0" The top position of the scanning area, in inches. The default value is "0.0". This option is supported by TWAIN devices only.
"AreaRight" "0.0"..."100.0" The rightmost position of the scanning area, in inches. The default value is "0.0".
"AreaBottom" "0.0"..."100.0" The bottom position of the scanning area, in inches. The default value is "0.0".



The "BarCode" section:
Option Name Possible Values Description
"TypesMask" integer Defines which barcode types are to be recognized. The value is a bitmask, a combination of BARCODE_TYPE_MASK_XXXXX flags. The default value is "0xFFFF" (all barcode types are processed).
"Directions" "1", "2", "3" The barcodes searching directions. "1" – detect horizontal barcodes. "2" – detect vertical barcodes. "3" – detect both horizontal and vertical barcodes (slower). The default value is "3".
"SearchMode" "1", "2", "3" The barcodes search mode. Poor-quality barcodes sometimes can be found in the original image but missed in the binarized image, and vice versa. "1" – find barcodes in the binarized image (sufficient for good-quality images). "2" – find barcodes in the original image. "3" – find barcodes both in the original and binarized images (slower). The default value is "1".



The configuration options are stored in the top-level sections related to block types. For more details about block types, see the Blk_SetType function. The following top-level sections are pre-defined: To define your own options for a specific block, create a new block type by adding x * 0x100 to the base block type (x = 1 ... 255), and use that block type as a parameter for the "Blk_SetType", "Cfg_SetOption", and "Cfg_GetOption" functions. The respective top-level section will be created in the configuration for that block. For example, if you need to have your own options for a BT_OCRTEXT zone, call

Blk_SetType(BlockObj, BT_OCRTEXT + 1*256)

and then

Cfg_SetOption(CfgObj, BT_OCRTEXT + 1*256, L"Dictionaries/UseDictionary", L"0")

A new top-level section "OcrText_1" will be created with the "Dictionaries/UseDictionary" option. Now this block will use the "OcrText_1" section instead of "OcrText". If some option is not defined in the "OcrText_1" section, the "OcrText" section will be used. If the "OcrText" section doesn’t have this option too, the "Default" section will be used. If the "Default" section doesn’t have this option too, the default option value will be used.

Note that most options in the "Main", "ImgAlizer", and "Binarizer" sections always use the "Default" section and cannot be overriden for blocks individually.


Example

The following code initializes the OCR engine, creates OCR-related objects, loads the configuration from the "Config.dat" file, and loads and recognizes an image using the German language:

C++
int CfgObj, OcrObj, ImgObj, res, n;
wchar_t* txt;
Engine_InitializeAdvanced(&CfgObj, &OcrObj, &ImgObj); //initialize OCR engine, create objects and load configuration
res = Img_LoadFile(ImgObj, L"c:\\sample.bmp"); //load some image for OCR
if (res > ERROR_FIRST) {}; //insert error handler here
Cfg_SetOption(CfgObj, BT_DEFAULT, L"Languages/German", L"1"); //select German language
Cfg_SetOption(CfgObj, BT_DEFAULT, L"Languages/English", L"0"); //unselect English language
res = Img_OCR(ImgObj, OCRSTEP_FIRST, OCRSTEP_LAST, OCRFLAG_NONE); //perform OCR
if (res > ERROR_FIRST) {}; //insert error handler here
n = Img_GetImgText(ImgObj, NULL, 0, FMT_EXACTCOPY) + 1; //get buffer size plus terminating NULL character
txt = (wchar_t*) malloc(2 * n); //allocate memory for text
Img_GetImgText(ImgObj, txt, n, FMT_EXACTCOPY); //get text
Engine_Uninitialize(); //release all created objects and uninitialize OCR engine
//use "txt" variable now
free(txt); //free memory


C#
//assume reference to NSOCR COM was added
using NSOCR_NameSpace; //Add NSOCR namespace from "NSOCR.cs" file
//...
int CfgObj, OcrObj, ImgObj, res;
string txt;
NSOCRLib.NSOCRClass NsOCR = new NSOCRLib.NSOCRClass(); //create NSOCR COM object instance
NsOCR.Engine_InitializeAdvanced(out CfgObj, out OcrObj, out ImgObj); //initialize OCR engine, create objects and load configuration
res = NsOCR.Img_LoadFile(ImgObj, "c:\\sample.bmp"); //load some image for OCR
if (res > TNSOCR.ERROR_FIRST) {}; //insert error handler here
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/German", "1"); //select German language
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/English", "0"); //unselect English language
res = NsOCR.Img_OCR(ImgObj, TNSOCR.OCRSTEP_FIRST, TNSOCR.OCRSTEP_LAST, TNSOCR.OCRFLAG_NONE); //perform OCR
if (res > TNSOCR.ERROR_FIRST) {}; //insert error handler here
NsOCR.Img_GetImgText(ImgObj, out txt, TNSOCR.FMT_EXACTCOPY); //get text
NsOCR.Engine_Uninitialize(); //release all created objects and uninitialize OCR engine


VB.NET
'assume reference to NSOCR COM was added
'assume "NSOCR.vb" file was added to project
Dim CfgObj, OcrObj, ImgObj, res As Integer
Dim txt As String = ""
Dim NsOCR As New NSOCRLib.NSOCRClass 'create NSOCR COM object instance
NsOCR.Engine_InitializeAdvanced(CfgObj, OcrObj, ImgObj) 'initialize OCR engine, create objects and load configuration
res = NsOCR.Img_LoadFile(ImgObj, "c:\sample.bmp") 'load some image for OCR
If res > TNSOCR.ERROR_FIRST Then 'insert error handler here
End If
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/German", "1") 'select German language
NsOCR.Cfg_SetOption(CfgObj, TNSOCR.BT_DEFAULT, "Languages/English", "0") 'unselect English language
res = NsOCR.Img_OCR(ImgObj, TNSOCR.OCRSTEP_FIRST, TNSOCR.OCRSTEP_LAST, TNSOCR.OCRFLAG_NONE) 'perform OCR
If res > TNSOCR.ERROR_FIRST Then 'insert error handler here
End If
NsOCR.Img_GetImgText(ImgObj, txt, TNSOCR.FMT_EXACTCOPY) 'get text
NsOCR.Engine_Uninitialize() 'release all created objects and uninitialize OCR engine


Java
//assume NSOCR package was included
//Java VM option "-Djava.library.path" must point to "Bin" folder (x86 platform) or to "Bin_64" folder (x64 platform)
//...
NSOCR.HCFG CfgObj = new NSOCR.HCFG();
NSOCR.HOCR OcrObj = new NSOCR.HOCR();
NSOCR.HIMG ImgObj = new NSOCR.HIMG();
int res;
StringBuffer txt = new StringBuffer();
NSOCR.Engine.Engine_InitializeAdvanced(CfgObj, OcrObj, ImgObj); //initialize OCR engine, create objects and load configuration
res = NSOCR.Engine.Img_LoadFile(ImgObj, "c:\\sample.bmp"); //load some image for OCR
if (res > NSOCR.Error.ERROR_FIRST) {}; //insert error handler here
NSOCR.Engine.Cfg_SetOption(CfgObj, NSOCR.Constant.BT_DEFAULT, "Languages/German", "1"); //select German language
NSOCR.Engine.Cfg_SetOption(CfgObj, NSOCR.Constant.BT_DEFAULT, "Languages/English", "0"); //unselect English language
res = NSOCR.Engine.Img_OCR(ImgObj, NSOCR.Constant.OCRSTEP_FIRST, NSOCR.Constant.OCRSTEP_LAST, NSOCR.Constant.OCRFLAG_NONE); //perform OCR
if (res > NSOCR.Error.ERROR_FIRST) {}; //insert error handler here
NSOCR.Engine.Img_GetImgText(ImgObj, txt, NSOCR.Constant.FMT_EXACTCOPY); //get text
System.out.println(txt.toString());
NSOCR.Engine.Engine_Uninitialize(); //release all created objects and uninitialize OCR engine