Full fledged Lucene Query Syntax support in Nutch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Full fledged Lucene Query Syntax support in Nutch

ravi chintakunta
Lucene supports fuzzy, wildcard, range, proximity searches as listed
here: http://lucene.apache.org/java/docs/queryparsersyntax.html

But Nutch does not use all these capabilities. It is limited by query
parsing in org.apache.nutch.analysis.NutchAnalysis and the query
filters hosted in plugins.

We have to modify the analyzer and add more plugins to Nutch to use
the Lucene's query syntax. Or we have to directly use Lucene's Query
Parser. I tried the second approach by modifying
org.apache.nutch.searcher.IndexSearcher and that seems to work.

Is there a reason that Nutch does not support the entire Lucene query
syntax by default?

Thanks in advance,
Ravi Chintakunta
vis
Reply | Threaded
Open this post in threaded view
|

Re: Full fledged Lucene Query Syntax support in Nutch

vis
Sorry, I am on holiday until the 8th of May.

Please contact the [hidden email] for urgent matters.

Kind regards, Herman.

Reply | Threaded
Open this post in threaded view
|

Re: Full fledged Lucene Query Syntax support in Nutch

Ravish Bhagdev
In reply to this post by ravi chintakunta
reason is performance.  Allowing above means more complex query which causes
more dealy in getting results.  If you need these features, you know how to
get them, but its tradeoff with performance.  May be not if number of pages
are less, it will on large scale.

-- Ravish.


On 5/2/06, Ravi Chintakunta <[hidden email]> wrote:

>
> Lucene supports fuzzy, wildcard, range, proximity searches as listed
> here: http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> But Nutch does not use all these capabilities. It is limited by query
> parsing in org.apache.nutch.analysis.NutchAnalysis and the query
> filters hosted in plugins.
>
> We have to modify the analyzer and add more plugins to Nutch to use
> the Lucene's query syntax. Or we have to directly use Lucene's Query
> Parser. I tried the second approach by modifying
> org.apache.nutch.searcher.IndexSearcher and that seems to work.
>
> Is there a reason that Nutch does not support the entire Lucene query
> syntax by default?
>
> Thanks in advance,
> Ravi Chintakunta
>
Reply | Threaded
Open this post in threaded view
|

Re: Full fledged Lucene Query Syntax support in Nutch

ravi chintakunta
Performance might be a reason, but only the queries that include
wildcards or fuzzy characters would be slowed down but not all the
queries right? The regular plain text searches performance shouldn't
be affected.

Any thoughts?

Thanks,
Ravi Chintakunta

On 5/3/06, Ravish Bhagdev <[hidden email]> wrote:

> reason is performance.  Allowing above means more complex query which causes
> more dealy in getting results.  If you need these features, you know how to
> get them, but its tradeoff with performance.  May be not if number of pages
> are less, it will on large scale.
>
> -- Ravish.
>
>
> On 5/2/06, Ravi Chintakunta <[hidden email]> wrote:
> >
> > Lucene supports fuzzy, wildcard, range, proximity searches as listed
> > here: http://lucene.apache.org/java/docs/queryparsersyntax.html
> >
> > But Nutch does not use all these capabilities. It is limited by query
> > parsing in org.apache.nutch.analysis.NutchAnalysis and the query
> > filters hosted in plugins.
> >
> > We have to modify the analyzer and add more plugins to Nutch to use
> > the Lucene's query syntax. Or we have to directly use Lucene's Query
> > Parser. I tried the second approach by modifying
> > org.apache.nutch.searcher.IndexSearcher and that seems to work.
> >
> > Is there a reason that Nutch does not support the entire Lucene query
> > syntax by default?
> >
> > Thanks in advance,
> > Ravi Chintakunta
> >
>
>