Filtering on a 'unique key' set

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering on a 'unique key' set

Henrib-2
Merely an efficiency related question: is there any other way to filter on a uniqueKey set than using the 'fq' parameter & building a list of the uniqueKeys?
In 'raw' Lucene, you could use filters directly in search; is this (close to) equivalent efficiency wise?
Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Yonik Seeley-2
On 6/17/07, Henrib <[hidden email]> wrote:
> Merely an efficiency related question: is there any other way to filter on a
> uniqueKey set than using the 'fq' parameter & building a list of the
> uniqueKeys?

I don't thnik so...

> In 'raw' Lucene, you could use filters directly in search; is this (close
> to) equivalent efficiency wise?

Yes, any fq params are turned into filters.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Henrib-2
Thanks Yonik;
Let me twist the same question another way; I'm running Solr embedded, the uniqueKey set that pre-exists  may be large, is per-query (most likely not useful to cache it) and is iterable. I'd rather avoid making a string to build the 'fq', get it parsed, etc.
Would it be as safe & more efficient in a (custom) request handler to create a DocSet by fetching termDocs for each key used as a Term & use is as a filter? Or is this just a bad idea?

Pseudo code being:
    DocSet keyFilter(org.apache.lucene.index.IndexReader reader,
            String keyField,
            java.util.Iterator<String> ikeys) throws java.io.IOException {
        org.apache.solr.util.OpenBitSet bits = new org.apache.solr.util.OpenBitSet(reader.maxDoc());
        if (ikeys.hasNext()) {
            org.apache.lucene.index.Term term = new org.apache.lucene.index.Term(keyField,ikeys.next());
            org.apache.lucene.index.TermDocs termDocs = reader.termDocs(term);
            if (termDocs.next())
                bits.fastSet(termDocs.doc());
            while(ikeys.hasNext()) {
                termDocs.seek(term.createTerm(ikeys.next()));
                if(termDocs.next())
                    bits.fastSet(termDocs.doc());
            }
            termDocs.close();
        }
        return new org.apache.solr.search.BitDocSet(bits);
    }

Thanks again
Yonik Seeley wrote
On 6/17/07, Henrib <hbiestro@gmail.com> wrote:
> Merely an efficiency related question: is there any other way to filter on a
> uniqueKey set than using the 'fq' parameter & building a list of the
> uniqueKeys?

I don't thnik so...

> In 'raw' Lucene, you could use filters directly in search; is this (close
> to) equivalent efficiency wise?

Yes, any fq params are turned into filters.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Yonik Seeley-2
On 6/18/07, Henrib <[hidden email]> wrote:
> Thanks Yonik;
> Let me twist the same question another way; I'm running Solr embedded, the
> uniqueKey set that pre-exists  may be large, is per-query (most likely not
> useful to cache it) and is iterable. I'd rather avoid making a string to
> build the 'fq', get it parsed, etc.
> Would it be as safe & more efficient in a (custom) request handler to create
> a DocSet by fetching termDocs for each key used as a Term & use is as a
> filter?

Yes, that should work fine.
Most of the savings will be avoiding the query parsing.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Henrib-2

Is it reasonable to implement a RequestHandler that systematically uses a DocSet as a filter for the restriction queries? I'm under the impression that SolrIndexSearcher.getDocSet(Query, DocSet) would use the cache properly & that calling it in a loop would perform the 'and' between the filters...

pseudo code (refactored from Standard & Dismax):
      /* * * Restrict Results * * */
      List<Query> restrictions = U.parseFilterQueries(req);
      DocSet rdocs = myUniqueKeySetThatMayBeNull();
      if (restrictions != null) for(Query r : restrictions) {
          rdocs = s.getDocSet(r, rdocs);
      }
      /* * * Generate Main Results * * */
      flags |= U.setReturnFields(req,rsp);
      DocListAndSet results = null;
      NamedList facetInfo = null;
      if (params.getBool(FACET,false)) {
        results = s.getDocListAndSet(query, rdocs,
                                     SolrPluginUtils.getSort(req),
                                     params.getInt(START,0), params.getInt(ROWS,10),
                                     flags);
        facetInfo = getFacetInfo(req, rsp, results.docSet);
      } else {
        results = new DocListAndSet();
        results.docList = s.getDocList(query, rdocs,
                                       SolrPluginUtils.getSort(req),
                                       params.getInt(START,0), params.getInt(ROWS,10),
                                       flags);
      }



Yonik Seeley wrote
On 6/18/07, Henrib <hbiestro@gmail.com> wrote:
> Thanks Yonik;
> Let me twist the same question another way; I'm running Solr embedded, the
> uniqueKey set that pre-exists  may be large, is per-query (most likely not
> useful to cache it) and is iterable. I'd rather avoid making a string to
> build the 'fq', get it parsed, etc.
> Would it be as safe & more efficient in a (custom) request handler to create
> a DocSet by fetching termDocs for each key used as a Term & use is as a
> filter?

Yes, that should work fine.
Most of the savings will be avoiding the query parsing.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Yonik Seeley-2
On 6/19/07, Henrib <[hidden email]> wrote:
> Is it reasonable to implement a RequestHandler that systematically uses a
> DocSet as a filter for the restriction queries?

How many unique keys would typically be used to construct the filter?

> I'm under the impression
> that SolrIndexSearcher.getDocSet(Query, DocSet) would use the cache properly
> & that calling it in a loop would perform the 'and' between the filters...

Yes, but I wouldn't do that for each query unless each query was
likely to have a different id list.

There's also a getDocListAndSet that takes a List<Query> as a filter.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Filtering on a 'unique key' set

Henrib-2

What I'm after is to restrict the 'whole' index through a set of unique keys.
Each unique key set is likely to have between 100 & 10000 keys and these sets are expected to be different for most of the queries. I'm trying to see if I can achieve a generic 'fk' (for filter key) kind of parameter so this could be applied to 'any' RequestHandler.

To keep the filter-queries functionality (as a List<Query>), I compute a DocSet by using my 'unique key filter docset' as a base and iteratively 'and' it with the filter-queries executed through SolrIndexReader.getDocSet(Query, DocSet).

The other way could be creating a BooleanQuery that 'ands' TermQueries built from the unique key set; I might still revert to that since my current code needs a small patch in SolrIndexReader (flags in getDocList).

In SolrIndexReader , if getDocListAndSet & getDocList were to accept a DocSet filter plus the List<Query>, I'd use those but I had to choose whether I use a List<Query> OR a DocSet as filter. I might have missed something, the code being quite dense: the equivalent signatures I could manage to get are:
public DocList getDocList(Query query, DocSet filter, Sort lsort, int offset, int len, int flags) throws IOException;
and
public DocListAndSet getDocListAndSet(Query query, DocSet filter, Sort lsort, int offset, int len, int flags) throws IOException;

It seems to work, I dont know if it is efficient cache wise & al.


Yonik Seeley wrote
On 6/19/07, Henrib <hbiestro@gmail.com> wrote:
> Is it reasonable to implement a RequestHandler that systematically uses a
> DocSet as a filter for the restriction queries?

How many unique keys would typically be used to construct the filter?

> I'm under the impression
> that SolrIndexSearcher.getDocSet(Query, DocSet) would use the cache properly
> & that calling it in a loop would perform the 'and' between the filters...

Yes, but I wouldn't do that for each query unless each query was
likely to have a different id list.

There's also a getDocListAndSet that takes a List<Query> as a filter.

-Yonik