A concise way to get just IDs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A concise way to get just IDs

rm_solr

Is there a consise way to query just a single field
from a solr query?

I was trying to use solr in a BI application which
will allow the dynamic creation of olap cubes based
on the results of keyword searches; and in this case
I'm not really interested in just the top N results
nor the documents themsleves; but rather just the
complete list of IDs that match.

With the queries I know how to write, I get responses
like this:
=========================================================
<?xml version='1.0' encoding='UTF-8'?><response>
<responseHeader><status>0</status><QTime>0</QTime></responseHeader>
<result numFound='64737' start='0'>
  <doc>
   <str name='id'>644960</str>
  </doc>
  <doc>
   <str name='id'>13</str>
  </doc>
  .............. and 200,000 or so more lines..........
</result>
</response>
=========================================================


I was extremely pleased to see that Solr itself seems
fast enough to be useful, but found that I'm spending
a dissapointing amount of time sending the results through
an XML parser that's surely overkill for this task.


Are there any options where I could get a result
that looks something like
=========================================================
<?xml version='1.0' encoding='UTF-8'?><response>
<responseHeader><status>0</status><QTime>0</QTime></responseHeader>
<result numFound='64737' start='0'>
  <doc_ids>
   644960 13 8357 66772 193162 ....
   ..... and 60,000 or so more numbers separated by whitespace.....
  </doc_ids>
</result>
</response>
=========================================================

Or should I be looking for an altogether different way of
approaching things?

    Thanks,
    Ron M
Reply | Threaded
Open this post in threaded view
|

Re: A concise way to get just IDs

Chris Hostetter-3

: Is there a consise way to query just a single field
: from a solr query?

At the moment, there isn't anything more concise then what you are already
getting.

What you can do though, is write a custom request handler that meets your
needs.  You wouldn't even need to loop over the Documents, you could use
the FieldCache, something like this...

  public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {

    ...
    Query q = // build your query however you want
    DocSet docs = req.getSearcher().getDocSet(q);
    StringBuffer buf = new StringBuffer();
    ints[] ids = FieldCache.DEFAULT.getInts
        (req.getSearcher().getReader(), "id");
    for (DocIterator i = docs.iterator(); i.hasNext(); ) {
       buf.append(ids[docs.nextDoc()]);
       buf.append(" ");
    }
    rsp.add("ids", buf.toString());
  }

(if id isn't a simple int or string, you'll need to convert it, take a
look at SolrIndexSearcher.getSchema(), IndexSchema.getFieldType(), and
FieldType.indexedToReadable)

BUT! ... if you're going to write a custom handler, you might consider
what you're doing with those ids in your client, and wether it would make
sense to put that logic in the plugin -- and really cut down on the data
sent over the wire.  (Forgive me, i don't really get what an "olap cube"
is)



-Hoss