Monday 7 October 2013

Build Rus, Eng, Ger AOT morpho for .NET

1) Download trunk from http://seman.sourceforge.net/. I check out it to p:\SEMAN\. Build Debug. Don't try to build other configs, they will fail even worse than in Debug configuration.
2) Download http://aot.ru/download.php — RusLemmatizer.zip and MorphWizard.zip. Install with default paths.
3) Delete all from c:\Rml\Dicts\.
4) Copy p:\SEMAN\Dicts\Morph\ to c:\Rml\Dicts\.
5) Copy p:\SEMAN\Dicts\SrcMorph\ to To c:\Rml\Dicts\.
6) From p:\SEMAN\Source\MorphGen\Debug take MorphGen.exe and replace it into c:\Rml\Bin\.
7) Now just run eng_gen.bat, ger_gen.bat and rus_gen.bat. It is slow, schedule 2-4 hours.
8) Use the resulting binaries with Lemmatizer.NET, which is compilable from p:\SEMAN\Source\LemmatizerNET.sln .
9) But introduce a little workaround to Lemmatizer.NET. In Lemmatizer.cs, in LoadDictionariesRegistry function, replace the following:

_useStatistic = true;
_statistic.Load(this, "l", manager);

to:

if (Language == InternalMorphLanguage.morphRussian)
{
   _useStatistic = true;
   _statistic.Load(this, "l", manager);
}
else
{
   _useStatistic = false;
}

10) To work with dictionaries, make separate folder like p:\Lemmatize\Rml and put there c:\Rml\Bin and c:\Rml\Dicts.
11) The minimal test program to do lemmatizing is the following.

class Program
{
  static void Main(string[] args)
  {
    Console.WriteLine("Enter Russian word (0 to exit)");
    ILemmatizer lem = LemmatizerFactory.
      Create(MorphLanguage.Russian);
    var manager = FileManager.
      GetFileManager(@"p:\Lemmatize\Rml"); // make it relative!
    lem.LoadDictionariesRegistry(manager);
    string word;
    do
    {
      Console.Write("> ");
      word = Console.ReadLine();
      var paradigmList = lem.
        CreateParadigmCollectionFromForm(word, false, true);
      for (var i = 0; i < paradigmList.Count; i++)
      {
        var paradigm = paradigmList[i];
        Console.WriteLine ("\t" + paradigm.Norm);
      }
    }
    while (word != "0");
  }
}

No comments:

Post a Comment