How to set up RussianAnalyzer?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to set up RussianAnalyzer?

Stephanie Belton
Hello,

 

I am running a Perl site and I am evaluating Solr as a gateway to Lucene. So far I am very pleased with the functionality it provides and the performance. My Java is quite basic so I am sorry if my questions sound trivial but I am trying to configure Solr to index a Russian site and I am hitting some hurdles. I have a basic install using the Jetty bundle from the tutorial.

 

Firstly I have added the following to schema.xml:

 

    <fieldtype name="textRussian" class="solr.TextField" positionIncrementGap="100" >

      <analyzer class="org.apache.lucene.analysis.ru.RussianAnalyzer"/>

    </fieldtype>

 

 And then added a Russian field:

<field name="name" type="textRussian" indexed="true" stored="true"/>

 

I downloaded lucene-analyzers-2.0.0.jar and placed it in the lib directory (tried both solr/lib and solr/example/lib) but I keep getting the same error (ClassNotFoundException, see stack trace below) when starting the server. Have I missed a step here??

 

I have been through the mailing list archive and haven’t found a definite answer to my question above. I have also investigated the next step which is the ability to add stop words, synonyms etc. It seems that I will have to create my own Factory(ies?) but didn’t find a detailed explanation on how to do this. I found this useful: http://www.mail-archive.com/solr-user@.../msg00198.html  but a bit too high-level for me! Should this be included in the Wiki?

 

Many thanks

Stephanie

 

11:19:05.958 WARN!! [main] org.mortbay.jetty.Server.main(Server.java:465) >08> EXCEPTION

org.mortbay.util.MultiException[org.apache.solr.core.SolrException: Error loading class 'org.apache.lucene.analysis.ru.RussianAnalyzer']

        at org.mortbay.http.HttpServer.doStart(HttpServer.java:686)

        at org.mortbay.util.Container.start(Container.java:72)

        at org.mortbay.jetty.Server.main(Server.java:460)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

        at org.mortbay.start.Main.invokeMain(Main.java:151)

        at org.mortbay.start.Main.start(Main.java:476)

        at org.mortbay.start.Main.main(Main.java:94)

org.apache.solr.core.SolrException: Error loading class 'org.apache.lucene.analysis.ru.RussianAnalyzer'

        at org.apache.solr.core.Config.findClass(Config.java:204)

        at org.apache.solr.core.Config.newInstance(Config.java:209)

        at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:466)

        at org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:294)

        at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:67)

        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:189)

        at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:170)

        at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:71)

        at javax.servlet.GenericServlet.init(GenericServlet.java:168)

        at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:383)

        at org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)

        at org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHandler.java:446)

        at org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebApplicationHandler.java:321)

        at org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:509)

        at org.mortbay.util.Container.start(Container.java:72)

        at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)

        at org.mortbay.util.Container.start(Container.java:72)

        at org.mortbay.jetty.Server.main(Server.java:460)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

        at org.mortbay.start.Main.invokeMain(Main.java:151)

        at org.mortbay.start.Main.start(Main.java:476)

        at org.mortbay.start.Main.main(Main.java:94)

Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.ru.RussianAnalyzer

        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)

        at org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)

        at org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:187)

        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:242)

        at org.apache.solr.core.Config.findClass(Config.java:188)

        ... 24 more

[0]=org.apache.solr.core.SolrException: Error loading class 'org.apache.lucene.analysis.ru.RussianAnalyzer'

        at org.apache.solr.core.Config.findClass(Config.java:204)

        at org.apache.solr.core.Config.newInstance(Config.java:209)

        at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:466)

        at org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:294)

        at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:67)

        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:189)

        at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:170)

        at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:71)

        at javax.servlet.GenericServlet.init(GenericServlet.java:168)

        at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:383)

        at org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)

        at org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHandler.java:446)

        at org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebApplicationHandler.java:321)

        at org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationContext.java:509)

        at org.mortbay.util.Container.start(Container.java:72)

        at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)

        at org.mortbay.util.Container.start(Container.java:72)

        at org.mortbay.jetty.Server.main(Server.java:460)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:585)

        at org.mortbay.start.Main.invokeMain(Main.java:151)

        at org.mortbay.start.Main.start(Main.java:476)

        at org.mortbay.start.Main.main(Main.java:94)

Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.ru.RussianAnalyzer

        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)

        at org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)

        at org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:187)

        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:242)

        at org.apache.solr.core.Config.findClass(Config.java:188)

        ... 24 more

 

Reply | Threaded
Open this post in threaded view
|

Re: How to set up RussianAnalyzer?

Chris Hostetter-3

: I downloaded lucene-analyzers-2.0.0.jar and placed it in the lib
: directory (tried both solr/lib and solr/example/lib) but I keep getting
: the same error (ClassNotFoundException, see stack trace below) when
: starting the server. Have I missed a step here??

"solr/lib" is the lib dir used when compiling solr, "solr/example/lib" is
the lib ir used by Jetty ... whaty you want is "${solr.home}/lib" ...
which in the example install of Jetty would be "solr/example/solr/lib"

...it's a bit confusing i know.

: definite answer to my question above. I have also investigated the next
: step which is the ability to add stop words, synonyms etc. It seems that
: I will have to create my own Factory(ies?) but didn’t find a detailed
: explanation on how to do this. I found this useful:

Adding new Factories requires a some knowledge of writing/compiling Java
classes ... if you have a basic understanding of Java development, there
isn't really a lot of specific Solr/Lucene information you need, you just
subclass one of the eisting Base Factory classes and define your own
"create" method ... things only get tricky if you need to do anything
complicated with configuration arguments.

If you wanted to write new factories for the RussianLetterTokenizer and
Russian*Filter classes, they would probably look very similar to the
example in the URL you mentioned, but with some simple argument processing
to decide which charset to use, maybe something like this...

        public class RussianStemFilterFactory extends BaseTokenizerFactory {
          public TokenStream create(TokenStream input) {
            String charsetName = getArgs().get("charset");
            char[] charset = RussianCharsets.UnicodeRussian;
            if (charsetName.equals("KOI8") charset = RussianCharsets.KOI8;
            if (charsetName.equals("CP1251") charset = RussianCharsets.CP1251;
            return new RussianStemFilterFactory(input, charset);
          }
        }

Then you could use them like this...

    <fieldtype name="textRU" class="solr.TextField">
      <analyzer>
        <tokenizer class="yourpackage.RussianLetterTokenizerFactory"
                   charset="CP1251"/>
        <filter class="yourpackage.RussianLowerCaseFilterFactory"
                charset="CP1251"/>
        <filter class="solr.StopFilterFactory
                words="yourwords.txt""/>
        <filter class="yourpackage.RussianStemFilterFactory"
                charset="CP1251"/>
      </analyzer>
    </fieldtype>




-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: How to set up RussianAnalyzer?

Stephanie Belton-2
Thank you Chris for your response, I got the RussianAnalyzer working
following your advice (once I downloaded the last version of Solr! Mine was
a couple of weeks old and didn't support the plugin feature), I also came
across this new page: http://wiki.apache.org/solr/SolrPlugins

Many thanks, I shall plough on with this, so far so good!

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 27 November 2006 19:29
To: [hidden email]
Subject: Re: How to set up RussianAnalyzer?


: I downloaded lucene-analyzers-2.0.0.jar and placed it in the lib
: directory (tried both solr/lib and solr/example/lib) but I keep getting
: the same error (ClassNotFoundException, see stack trace below) when
: starting the server. Have I missed a step here??

"solr/lib" is the lib dir used when compiling solr, "solr/example/lib" is
the lib ir used by Jetty ... whaty you want is "${solr.home}/lib" ...
which in the example install of Jetty would be "solr/example/solr/lib"

...it's a bit confusing i know.

: definite answer to my question above. I have also investigated the next
: step which is the ability to add stop words, synonyms etc. It seems that
: I will have to create my own Factory(ies?) but didn’t find a detailed
: explanation on how to do this. I found this useful:

Adding new Factories requires a some knowledge of writing/compiling Java
classes ... if you have a basic understanding of Java development, there
isn't really a lot of specific Solr/Lucene information you need, you just
subclass one of the eisting Base Factory classes and define your own
"create" method ... things only get tricky if you need to do anything
complicated with configuration arguments.

If you wanted to write new factories for the RussianLetterTokenizer and
Russian*Filter classes, they would probably look very similar to the
example in the URL you mentioned, but with some simple argument processing
to decide which charset to use, maybe something like this...

        public class RussianStemFilterFactory extends BaseTokenizerFactory {
          public TokenStream create(TokenStream input) {
            String charsetName = getArgs().get("charset");
            char[] charset = RussianCharsets.UnicodeRussian;
            if (charsetName.equals("KOI8") charset = RussianCharsets.KOI8;
            if (charsetName.equals("CP1251") charset =
RussianCharsets.CP1251;
            return new RussianStemFilterFactory(input, charset);
          }
        }

Then you could use them like this...

    <fieldtype name="textRU" class="solr.TextField">
      <analyzer>
        <tokenizer class="yourpackage.RussianLetterTokenizerFactory"
                   charset="CP1251"/>
        <filter class="yourpackage.RussianLowerCaseFilterFactory"
                charset="CP1251"/>
        <filter class="solr.StopFilterFactory
                words="yourwords.txt""/>
        <filter class="yourpackage.RussianStemFilterFactory"
                charset="CP1251"/>
      </analyzer>
    </fieldtype>




-Hoss