Where to find drill-down examples (source code)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Where to find drill-down examples (source code)

Martin Braun-2
Hello all,

I want to realize a drill-down Function aka "narrow search" aka "refine
search".

I want to have something like:

Refine by Date:
* 1990-2000 (30 Docs)
* 2001-2003 (200 Docs)
* 2004-2006 (10 Docs)

But not only DateRanges but also for other Categories.

What I have found in the List-Archives so far is that I have to  use
Filters for my search.

Does anybody knows where to find some Source Code, to get an Idea how to
implement this?
 I think that's a useful property for a search engine, so are there any
contributions for Lucene for that?

thanks,
martin



--
Universitaetsbibliothek Heidelberg   Tel: +49 6221 54-2580
Ploeck 107-109, D-69117 Heidelberg   Fax: +49 6221 54-2623

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where to find drill-down examples (source code)

Miles Barr-3
Martin Braun wrote:

>I want to realize a drill-down Function aka "narrow search" aka "refine
>search".
>
>I want to have something like:
>
>Refine by Date:
>* 1990-2000 (30 Docs)
>* 2001-2003 (200 Docs)
>* 2004-2006 (10 Docs)
>
>But not only DateRanges but also for other Categories.
>
>What I have found in the List-Archives so far is that I have to  use
>Filters for my search.
>
>Does anybody knows where to find some Source Code, to get an Idea how to
>implement this?
> I think that's a useful property for a search engine, so are there any
>contributions for Lucene for that?
>

If you want to do a refined search I'd put the original query in a
QueryFilter, which filters on the new search.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html

e.g.

Query original = // saved from the last time the search was executed
QueryFilter filter = new QueryFilter(original);

QueryParser parser = ...
Searcher searcher = ...

String userQuery;
Query query = parser.parse(userQuery);

Hits hits = searcher.search(query, filter);


Fill in the blanks with however you normally get your QueryParser and
IndexSearcher. You could store the old query on the session, or
somewhere else.

Then the QueryFilter will ensure you're doing a refinement, but won't
affect the scoring in the new search.


Alternatively, since you appear to only want to refine on dates and
categories, you might want to put them in filters so they don't affect
the score, and leave the query as is. In which case you can use a
RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to
handle the categories.

If you need multiple filters you can use the ChainedFilter class.




Miles



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where to find drill-down examples (source code)

kkrugler
In reply to this post by Martin Braun-2
>Hello all,
>
>I want to realize a drill-down Function aka "narrow search" aka "refine
>search".
>
>I want to have something like:
>
>Refine by Date:
>* 1990-2000 (30 Docs)
>* 2001-2003 (200 Docs)
>* 2004-2006 (10 Docs)
>
>But not only DateRanges but also for other Categories.
>
>What I have found in the List-Archives so far is that I have to  use
>Filters for my search.
>
>Does anybody knows where to find some Source Code, to get an Idea how to
>implement this?

One example (I think) is:

http://www.krugle.com/files/cvs/savannah.nongnu.org/sdx/sdx_v2/src/java/fr/gouv/culture/sdx/search/lucene/query/DateIntervalQuery.java

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where to find drill-down examples (source code)

Mark Miller-3
Ken Krugler wrote:

>> Hello all,
>>
>> I want to realize a drill-down Function aka "narrow search" aka "refine
>> search".
>>
>> I want to have something like:
>>
>> Refine by Date:
>> * 1990-2000 (30 Docs)
>> * 2001-2003 (200 Docs)
>> * 2004-2006 (10 Docs)
>>
>> But not only DateRanges but also for other Categories.
>>
>> What I have found in the List-Archives so far is that I have to  use
>> Filters for my search.
>>
>> Does anybody knows where to find some Source Code, to get an Idea how to
>> implement this?
>
> One example (I think) is:
>
> http://www.krugle.com/files/cvs/savannah.nongnu.org/sdx/sdx_v2/src/java/fr/gouv/culture/sdx/search/lucene/query/DateIntervalQuery.java 
>
>
> -- Ken
from the lucene wiki:

Can Lucene do a "search within search", so that the second search is
constrained by the results of the first query?

Yes. There are two primary options:

  *

      Use QueryFilter with the previous query as the filter. (you can
search the mailing list archives for QueryFilter and Doug Cutting's
recommendations against using it for this purpose)
    *

      Combine the previous query with the current query using
BooleanQuery, using the previous query as required.

The BooleanQuery approach is the recommended one.

unquote:
seems to me that simply adding your drilled down query part to the
previous query with a BooleanQuery should do the trick.


- mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

drill-down heuristics WAS: Where to find drill-down examples (source code)

Martin Braun-2
In reply to this post by Miles Barr-3
hi miles,

thanks for the response.
I think I didn't explain my Problem good enough.

The harder problem for me is how to get the proposals for the
refinement?  I have a date-range of 16xx to now, for about 4 bn. docs.
So the number of found documents could be quite large. But the
distribution of the dates could be very different form one query to another.
I hope there is a better way than to collect all dates with HitIterator,
and do statistics on the data?

Is there something that could be done while indexing?
What would be a high-performance heuristic?

The same problem with other categories like the author: how to find good
proposals for a given result set?

>> I want to have something like:
>>
>> Refine by Date:
>> * 1990-2000 (30 Docs)
>> * 2001-2003 (200 Docs)
>> * 2004-2006 (10 Docs)
>>
>> But not only DateRanges but also for other Categories.
>>
>> Does anybody knows where to find some Source Code, to get an Idea how to
>> implement this?
>> I think that's a useful property for a search engine, so are there any
>> contributions for Lucene for that?
>>
>
> If you want to do a refined search I'd put the original query in a
> QueryFilter, which filters on the new search.
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html
> Alternatively, since you appear to only want to refine on dates and
> categories, you might want to put them in filters so they don't affect
> the score, and leave the query as is. In which case you can use a
> RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to
> handle the categories.
>
> If you need multiple filters you can use the ChainedFilter class.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: drill-down heuristics WAS: Where to find drill-down examples (source code)

Miles Barr-3
On Monday 24 July 2006 08:17, Martin Braun wrote:

> I think I didn't explain my Problem good enough.
>
> The harder problem for me is how to get the proposals for the
> refinement?  I have a date-range of 16xx to now, for about 4 bn. docs.
> So the number of found documents could be quite large. But the
> distribution of the dates could be very different form one query to
> another. I hope there is a better way than to collect all dates with
> HitIterator, and do statistics on the data?
>
> Is there something that could be done while indexing?
> What would be a high-performance heuristic?
>
> The same problem with other categories like the author: how to find good
> proposals for a given result set?

That's a lot trickier and there might be others on the list who can give a
better answer. I think what you need to do is to extend HitCollector:

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/HitCollector.html

Your implementation will keep a count of dates and categories that the results
are in. Then you can use this information to build up your refinements. The
problem with this approach is that HitCollector#collect only provides you
with the document number, not the document itself, hence you still have to
load the document to find out its date and category, this will be slow for
large result sets.

You might want to keep this metadata in another store so you can quickly look
up the date and category based on the document number. This avoids having to
load all the documents from Lucene, which is an expensive operation. The
downside with this approach is that the document number changes when you
optimise your index, hence you'll have to rebuild your metadata store each
time you optimise.



Miles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: drill-down heuristics WAS: Where to find drill-down examples (source code)

Chris Hostetter-3
In reply to this post by Martin Braun-2

This is generally refered to as "faceted" searching ... you might find
descriptions of how to generate the "counts" per facet by searching for
that kwyword in the archive .. it also comes up now and then under the
subject of "category counts"

There is however a seperate issue that it sounds like you need help with
first: picking the facets, ie deciding that for one users search it makes
sense to use decade ranges, but for another persons search it makes sense
to use individual years.

This is really a problem of personal prefrence, and not something that can
be solved purely with technology.  Each field might need seperate rules to
determine what facets to provide (ie: dates might make sense to narrow by
decade untill only one decade is matched by the search, then narrowed by
years untill only one year is valid, then by month etc...  but meanwhile
"author name" faceting might make sense to use initial first, and then
switch to two letter name prefixes, etc...)

one thing to keep in mind when picking facets for dates or numeric values,
is that even though you might think you are helping your users by giving
them facets with an "even distribution" of documents, you may actual
confuse them if they are trying to get a sense of how the data is
distributed ... like showing someone a line plot graph where the axises
change half way down hte line.

if i give you these facets/counts for your search results...

  1901-1910  23
  1911-1920  25
  1921-1930  21
  1931-1940  26
  1941-1950  19
  1951-1960  22
  1961-1970  23
  1971-1990  20
  1991-2000  22
  2000-2006  7

...you might not notice that one of those ranges is 20 years.




: Date: Mon, 24 Jul 2006 09:17:55 +0200
: From: Martin Braun <[hidden email]>
: Reply-To: [hidden email], [hidden email]
: To: [hidden email]
: Subject: drill-down heuristics WAS: Where to find drill-down examples
:     (source code)
:
: hi miles,
:
: thanks for the response.
: I think I didn't explain my Problem good enough.
:
: The harder problem for me is how to get the proposals for the
: refinement?  I have a date-range of 16xx to now, for about 4 bn. docs.
: So the number of found documents could be quite large. But the
: distribution of the dates could be very different form one query to another.
: I hope there is a better way than to collect all dates with HitIterator,
: and do statistics on the data?
:
: Is there something that could be done while indexing?
: What would be a high-performance heuristic?
:
: The same problem with other categories like the author: how to find good
: proposals for a given result set?
:
: >> I want to have something like:
: >>
: >> Refine by Date:
: >> * 1990-2000 (30 Docs)
: >> * 2001-2003 (200 Docs)
: >> * 2004-2006 (10 Docs)
: >>
: >> But not only DateRanges but also for other Categories.
: >>
: >> Does anybody knows where to find some Source Code, to get an Idea how to
: >> implement this?
: >> I think that's a useful property for a search engine, so are there any
: >> contributions for Lucene for that?
: >>
: >
: > If you want to do a refined search I'd put the original query in a
: > QueryFilter, which filters on the new search.
: >
: > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html
: > Alternatively, since you appear to only want to refine on dates and
: > categories, you might want to put them in filters so they don't affect
: > the score, and leave the query as is. In which case you can use a
: > RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to
: > handle the categories.
: >
: > If you need multiple filters you can use the ChainedFilter class.
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where to find drill-down examples (source code)

ddigmann
In reply to this post by Miles Barr-3
I there a link to a zip file where I can get the entire package of source files (version 2, please).  I know I am able to view them in the Source Repository (http://svn.apache.org/viewvc/lucene/java/trunk/), but I do not really feel like going through each of those to download them all.  I am looking for a one stop shop here.


Miles Barr-3 wrote
Martin Braun wrote:

>I want to realize a drill-down Function aka "narrow search" aka "refine
>search".
>
>I want to have something like:
>
>Refine by Date:
>* 1990-2000 (30 Docs)
>* 2001-2003 (200 Docs)
>* 2004-2006 (10 Docs)
>
>But not only DateRanges but also for other Categories.
>
>What I have found in the List-Archives so far is that I have to  use
>Filters for my search.
>
>Does anybody knows where to find some Source Code, to get an Idea how to
>implement this?
> I think that's a useful property for a search engine, so are there any
>contributions for Lucene for that?
>

If you want to do a refined search I'd put the original query in a
QueryFilter, which filters on the new search.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html

e.g.

Query original = // saved from the last time the search was executed
QueryFilter filter = new QueryFilter(original);

QueryParser parser = ...
Searcher searcher = ...

String userQuery;
Query query = parser.parse(userQuery);

Hits hits = searcher.search(query, filter);


Fill in the blanks with however you normally get your QueryParser and
IndexSearcher. You could store the old query on the session, or
somewhere else.

Then the QueryFilter will ensure you're doing a refinement, but won't
affect the scoring in the new search.


Alternatively, since you appear to only want to refine on dates and
categories, you might want to put them in filters so they don't affect
the score, and leave the query as is. In which case you can use a
RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to
handle the categories.

If you need multiple filters you can use the ChainedFilter class.




Miles



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Where to find drill-down examples (source code)

Simon Willnauer
Either you grap the next best svn client and check out the branch of
2.0 or you just download
the source dist from a mirror..... use this one
http://mirrorspace.org/apache/lucene/java/

best regards simon

On 9/26/06, djd0383 <[hidden email]> wrote:

>
> I there a link to a zip file where I can get the entire package of source
> files (version 2, please).  I know I am able to view them in the Source
> Repository (http://svn.apache.org/viewvc/lucene/java/trunk/), but I do not
> really feel like going through each of those to download them all.  I am
> looking for a one stop shop here.
>
>
>
> Miles Barr-3 wrote:
> >
> > Martin Braun wrote:
> >
> >>I want to realize a drill-down Function aka "narrow search" aka "refine
> >>search".
> >>
> >>I want to have something like:
> >>
> >>Refine by Date:
> >>* 1990-2000 (30 Docs)
> >>* 2001-2003 (200 Docs)
> >>* 2004-2006 (10 Docs)
> >>
> >>But not only DateRanges but also for other Categories.
> >>
> >>What I have found in the List-Archives so far is that I have to  use
> >>Filters for my search.
> >>
> >>Does anybody knows where to find some Source Code, to get an Idea how to
> >>implement this?
> >> I think that's a useful property for a search engine, so are there any
> >>contributions for Lucene for that?
> >>
> >
> > If you want to do a refined search I'd put the original query in a
> > QueryFilter, which filters on the new search.
> >
> > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/QueryFilter.html
> >
> > e.g.
> >
> > Query original = // saved from the last time the search was executed
> > QueryFilter filter = new QueryFilter(original);
> >
> > QueryParser parser = ...
> > Searcher searcher = ...
> >
> > String userQuery;
> > Query query = parser.parse(userQuery);
> >
> > Hits hits = searcher.search(query, filter);
> >
> >
> > Fill in the blanks with however you normally get your QueryParser and
> > IndexSearcher. You could store the old query on the session, or
> > somewhere else.
> >
> > Then the QueryFilter will ensure you're doing a refinement, but won't
> > affect the scoring in the new search.
> >
> >
> > Alternatively, since you appear to only want to refine on dates and
> > categories, you might want to put them in filters so they don't affect
> > the score, and leave the query as is. In which case you can use a
> > RangeQuery for the dates, and a wrap a TermQuery in a QueryFilter to
> > handle the categories.
> >
> > If you need multiple filters you can use the ChainedFilter class.
> >
> >
> >
> >
> > Miles
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
>
> --
> View this message in context: http://www.nabble.com/Where-to-find-drill-down-examples-%28source-code%29-tf1980330.html#a6512411
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]