Using web2/NGramSpeller

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Using web2/NGramSpeller

Tkach
I've got the contrib/web2 app all set up from what I can tell, but when
I go to try to build the spelling index (step 8 of
http://wiki.apache.org/nutch/InstallingWeb2) I get an exception about
"maxBufferedDocs".  Can anyone at least point me toward where you're
supposed to set this?  I can certainly hard-code NGramSpeller to just
use something like 100, but I'm sure there must be a better way.

Opening crawl/index
Docs: 3,242
Using field: content
Exception in thread "main" java.lang.reflect.InvocationTargetException
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at
org.apache.nutch.plugin.PluginRepository.main(PluginRepository.java:417)
Caused by: java.lang.IllegalArgumentException: maxBufferedDocs must at
least be 2 when enabled
         at
org.apache.lucene.index.IndexWriter.setMaxBufferedDocs(IndexWriter.java:883)
         at org.apache.nutch.spell.NGramSpeller.main(NGramSpeller.java:273)
         ... 5 more