Nicomsoft OCR: Developer's Guide


Dictionaries

     NSOCR uses dictionaries for the best recognition quality. There are three types of dictionaries: the main built-in dictionary, the user dictionary, and the external dictionary.

The main built-in dictionary is highly recommended for most cases. This is the only dictionary that is enabled by default. You can disable it by setting the "Dictionaries/UseDictionary" option to "0". Please see the Configuration help section for details.

The user dictionary allows you to specify additional words to improve the recognition quality. This dictionary is disabled by default. You can enable it by setting the "Dictionaries/UseUserDictionary" option to "1". Additional words must be stored in the "UserDictionary.txt" files in the "Bin" and "Bin_64" directories. If you use both x86 and x64 binaries, these files must be identical. You can specify a different file for the user dictionary via the "Dictionaries/UserDictionary" option.

Note:
           "UserDictionary.txt" is a text file that uses Unicode, not ANSI. It starts with the bytes 0xFF 0xFE and contains Unicode characters only. In recent Windows versions, you can edit that file with Notepad. The format is simple: one word per line; no spaces are allowed; any line that begins with the "//" characters (that is, commented out) will be ignored. The user dictionary is case insensitive.


The external dictionary is useful for custom handling of words. It is an external DLL that is registered in NSOCR via the "Dictionaries/ExternalDLL" option. A user-defined function will be called every time when NSOCR checks whether a word exists in the dictionary. For more details and a sample implementation of the external dictionary, see the "External Dictionary" sample project in the NSOCR SDK.