SolrJ and Unique Doc ID

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrJ and Unique Doc ID

Grant Ingersoll-2
What's the best way to retrieve the unique key field from SolrJ?  From  
what I can tell, it seems like I would need to retrieve the schema and  
then parse it and get it from there, or am I missing something?

Thanks,
Grant
Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Ryan McKinley
right now you need to know the unique key name to get it...
I don't think we have any easy way to get that besides parsing the
schema....

With debugQuery=true, the uniqueKey is added to the 'explain' info:
<lst name="explain">
   <str name="id=YOURID,internal_docid=0">
    ...

this gets parsed into the QueryResults _explainMap and _docIdMap but i'm
not sure that is useful in the general sense...

ryan


Grant Ingersoll wrote:
> What's the best way to retrieve the unique key field from SolrJ?  From
> what I can tell, it seems like I would need to retrieve the schema and
> then parse it and get it from there, or am I missing something?
>
> Thanks,
> Grant
>

Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Yonik Seeley-2
In reply to this post by Grant Ingersoll-2
Hmmm, I should have just mandated that the id field be called "id"
from the start :-)

On Feb 11, 2008 5:51 PM, Grant Ingersoll <[hidden email]> wrote:
> What's the best way to retrieve the unique key field from SolrJ?  From
> what I can tell, it seems like I would need to retrieve the schema and
> then parse it and get it from there, or am I missing something?
>
> Thanks,
> Grant
>
Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Ryan McKinley
thoughts on requiring that for solrj?  perhaps in 2.0?  Not suggesting
it is a good idea (yet)... but we may want to consider it.


Yonik Seeley wrote:

> Hmmm, I should have just mandated that the id field be called "id"
> from the start :-)
>
> On Feb 11, 2008 5:51 PM, Grant Ingersoll <[hidden email]> wrote:
>> What's the best way to retrieve the unique key field from SolrJ?  From
>> what I can tell, it seems like I would need to retrieve the schema and
>> then parse it and get it from there, or am I missing something?
>>
>> Thanks,
>> Grant
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Grant Ingersoll-2
Another option is to add it to the responseHeader????  Or it could be  
a quick add to the LukeRH.  The former has the advantage that we  
wouldn't have to make extra calls at the cost of sending an extra  
string w/ every message.  The latter would work by asking for it up  
front and then saving it aside.  Any preference?  Or, we could add it  
to both, making the responseHeader one optional.

Of course, it probably would be useful to be able to request the  
schema from the server and build an IndexSchema object on the client  
side.  This could be added to the LukeRH as well.

Hindsight is 20/20...

On Feb 11, 2008, at 6:51 PM, Ryan McKinley wrote:

> thoughts on requiring that for solrj?  perhaps in 2.0?  Not  
> suggesting it is a good idea (yet)... but we may want to consider it.
>
>
> Yonik Seeley wrote:
>> Hmmm, I should have just mandated that the id field be called "id"
>> from the start :-)
>> On Feb 11, 2008 5:51 PM, Grant Ingersoll <[hidden email]> wrote:
>>> What's the best way to retrieve the unique key field from SolrJ?  
>>> From
>>> what I can tell, it seems like I would need to retrieve the schema  
>>> and
>>> then parse it and get it from there, or am I missing something?
>>>
>>> Thanks,
>>> Grant
>>>
>


Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

hossman
: Another option is to add it to the responseHeader????  Or it could be a quick
: add to the LukeRH.  The former has the advantage that we wouldn't have to make

adding the info to LukeRequestHandler makes sense.

Honestly: i can't think of a single use case where client code would care
about what the uniqueKey field is, unless it already *knew* what the
uniqueKey field is.

: Of course, it probably would be useful to be able to request the schema from
: the server and build an IndexSchema object on the client side.  This could be
: added to the LukeRH as well.

somebody was working on that at some point ... but i may be thinking of
the Ruby client ... no.... i'm pretty sure i remember it coming up in the
context of Java because i remember dicsussion that a full "IndexSchema"
was too much because it required the client to have the class files for
all of the analysis chain and filedtype classes.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Grant Ingersoll-2

On Feb 11, 2008, at 11:24 PM, Chris Hostetter wrote:

> : Another option is to add it to the responseHeader????  Or it could  
> be a quick
> : add to the LukeRH.  The former has the advantage that we wouldn't  
> have to make
>
> adding the info to LukeRequestHandler makes sense.
>
> Honestly: i can't think of a single use case where client code would  
> care
> about what the uniqueKey field is, unless it already *knew* what the
> uniqueKey field is.

:-)  Abstractions allow one to use different implementations.  My  
client/display doesn't know about Solr, it just knows it can search  
and the Solr implementation part of it can be pointed at any Solr  
instance (or other search engines as well), thus it needs to be able  
to "reflect" on Solr.  The unique key is a pretty generally useful  
thing across implementations.

In fact, I wish all the ReqHandlers had an introspection option, where  
one could see what params are supported as well.

>
>
> : Of course, it probably would be useful to be able to request the  
> schema from
> : the server and build an IndexSchema object on the client side.  
> This could be
> : added to the LukeRH as well.
>
> somebody was working on that at some point ... but i may be thinking  
> of
> the Ruby client ... no.... i'm pretty sure i remember it coming up  
> in the
> context of Java because i remember dicsussion that a full  
> "IndexSchema"
> was too much because it required the client to have the class files  
> for
> all of the analysis chain and filedtype classes.

It may be reasonable, as a compromise, to just have metadata about  
these things.  Sort of like BeanInfo provides.

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

hossman
: > Honestly: i can't think of a single use case where client code would care
: > about what the uniqueKey field is, unless it already *knew* what the
: > uniqueKey field is.
:
: :-)  Abstractions allow one to use different implementations.  My
: client/display doesn't know about Solr, it just knows it can search and the
: Solr implementation part of it can be pointed at any Solr instance (or other
: search engines as well), thus it needs to be able to "reflect" on Solr.  The
: unique key is a pretty generally useful thing across implementations.

but why does your client/display care which field is the uniqueKey field?  
knowing which fields it might query or ask for in the fl list sure -- but
why need to know about hte uniqueKey field specificly?

I could have an index of people where i document thatthe SSN field is
unique, and never even tell you that it's not the 'uniqueKey' Field --
that could be some completley unrelated field i don't want you to know
about called "customerId" -- but that doesn't acceft you as a client, you
can still query on whatever you wnat, get back whatever docs you want,
etc...  the onlything you can't do is "delete by id" (since you can't be
sure which field is the uniqueKey) but you can always delete by query.

: In fact, I wish all the ReqHandlers had an introspection option, where one
: could see what params are supported as well.

you and me both -- but the introspection shouldn't be intrinsic to the
ReuestHandler - as the Solr admin i may not want to expose all of those
options to my clients...

        http://wiki.apache.org/solr/MakeSolrMoreSelfService


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Grant Ingersoll-2

On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote:

> : > Honestly: i can't think of a single use case where client code  
> would care
> : > about what the uniqueKey field is, unless it already *knew* what  
> the
> : > uniqueKey field is.
> :
> : :-)  Abstractions allow one to use different implementations.  My
> : client/display doesn't know about Solr, it just knows it can  
> search and the
> : Solr implementation part of it can be pointed at any Solr instance  
> (or other
> : search engines as well), thus it needs to be able to "reflect" on  
> Solr.  The
> : unique key is a pretty generally useful thing across  
> implementations.
>
> but why does your client/display care which field is the uniqueKey  
> field?
> knowing which fields it might query or ask for in the fl list sure  
> -- but
> why need to know about hte uniqueKey field specificly?

How do I generate URLs to retrieve a document against any given Solr  
instance that I happen to be pointing at without knowing which field  
is the document id?   At any rate, the problem is solved in SOLR-478  
in less than 10 lines of code and doesn't introduce back-compat.  
issues.  I invoke this on instantiation of my client, get the field  
and then keep it around for use later.

>
>
> I could have an index of people where i document thatthe SSN field is
> unique, and never even tell you that it's not the 'uniqueKey' Field --
> that could be some completley unrelated field i don't want you to know
> about called "customerId" -- but that doesn't acceft you as a  
> client, you
> can still query on whatever you wnat, get back whatever docs you want,
> etc...  the onlything you can't do is "delete by id" (since you  
> can't be
> sure which field is the uniqueKey) but you can always delete by query.
>
> : In fact, I wish all the ReqHandlers had an introspection option,  
> where one
> : could see what params are supported as well.
>
> you and me both -- but the introspection shouldn't be intrinsic to the
> ReuestHandler - as the Solr admin i may not want to expose all of  
> those
> options to my clients...
>
> http://wiki.apache.org/solr/MakeSolrMoreSelfService

+1
Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

Erik Hatcher

On Feb 12, 2008, at 3:44 PM, Grant Ingersoll wrote:

> On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote:
>
>> : > Honestly: i can't think of a single use case where client code  
>> would care
>> : > about what the uniqueKey field is, unless it already *knew*  
>> what the
>> : > uniqueKey field is.
>> :
>> : :-)  Abstractions allow one to use different implementations.  My
>> : client/display doesn't know about Solr, it just knows it can  
>> search and the
>> : Solr implementation part of it can be pointed at any Solr  
>> instance (or other
>> : search engines as well), thus it needs to be able to "reflect"  
>> on Solr.  The
>> : unique key is a pretty generally useful thing across  
>> implementations.
>>
>> but why does your client/display care which field is the uniqueKey  
>> field?
>> knowing which fields it might query or ask for in the fl list sure  
>> -- but
>> why need to know about hte uniqueKey field specificly?
>
> How do I generate URLs to retrieve a document against any given  
> Solr instance that I happen to be pointing at without knowing which  
> field is the document id?

One cool technique, not instead of your change to Luke RH (a needed  
change IMO) but another  way to go about it - we have a  
DocumentRequestHandler that takes a uniqueKey parameter that  would  
retrieve and return that single document without having to specify  
the field name explicitly.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: SolrJ and Unique Doc ID

hossman

: > How do I generate URLs to retrieve a document against any given Solr
: > instance that I happen to be pointing at without knowing which field is the
: > document id?
:
: One cool technique, not instead of your change to Luke RH (a needed change
: IMO) but another  way to go about it - we have a DocumentRequestHandler that
: takes a uniqueKey parameter that  would retrieve and return that single
: document without having to specify the field name explicitly.

Erik's idea eliminates the need to know what the "name" of the uniqueKey
field is when formulating the query to "fetch one", but it doesn't solve
the crux of grants question: when looking at a list of results (with a
partial "fl" for example) how can you know which value to use to later
query on and get back just thta document (with the full "fl" for example)

My point was that while knowing the uniqueKey field solves the problem,
the person setting up the index may not want clients to know this ... the
clinet has to have *some* pre-existing knowledge about the structure of
the index ... grant's Luke patch solves this by letting the client get
this information from Luke, but in a general case a Solr Admin may not
want to expose that info to his clients (ie: the customerId vs SSN example
from my previous mail) ... so a general purpose client should probably
have a more general way to configure the "what field do i treat as unique"
info without requirng that the LukeHandler be available.




-Hoss