Tuesday, 3 December 2013

Use HunSpell in .NET and speed it up

To use HunSpell for Russian.

1) Take a demo-archive from NHunSpell.
2) Take AOT-based dictionary from LibreOffice. Open oxt-file with 7zip, take .dic and .aff files.
3) Load them in CSharpConsoleSample program, use the demo to test.
4) Update to latest DLLs.

The speed of the solution is acceptable in context suggesters, but not in search applications. HunSpell provides rich lists of variants, while we need short, precise lists. To achieve it, deactivate ngram-based suggesting by editing aff-file: add
MAXNGRAMSUGS 0
after
SET KOI8-R
This increases the speed dramatically (from 2 secs to several msecs for the list of 5 misspelled words), but you can get empty lists on some nontrivial typos.

No comments:

Post a Comment