newbie Q regarding schema configuration

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

newbie Q regarding schema configuration

Ian Holsman (Lists)
hi.

so I finally managed to find a bit of time to get a SolR instance  
going, and now have some questions about it ;-)

first the application is tagging. ie.. to associate some keywords  
with a given item, and to show them on a particular object (you can  
see this in action here http://economy-chat.com/aggy/detail/andrew- 
leigh/ )

It user-based (ie individuals can tag a particular object themselves,  
and that get's merged into a global summary for that object)
and it is also hierarchal, ie tagging a child implies you have also  
tagged the parent.

so.. my first question in schema.xml, can you have a composite key as  
the 'uniquekey' field, or do i need to do this on the client side?

2nd question.

can you have complex types which are multivalued?
I'd like to store something like
a tag-name with a corresponding tag-weighting.

can you do sum(*) type queries in lucene/solr? it is efficient ? or  
are you better having a 2nd index which has these sum(*) values in it  
and keep it up to date instead.



Thanks
Reply | Threaded
Open this post in threaded view
|

Re: newbie Q regarding schema configuration

Chris Hostetter-3

: so.. my first question in schema.xml, can you have a composite key as
: the 'uniquekey' field, or do i need to do this on the client side?

at the moment this would need to be done client site, but you're not the
first person to ask so i've added it to the TaskList ... it doesn't seem
like it would be too hard.

: can you have complex types which are multivalued?
: I'd like to store something like
: a tag-name with a corresponding tag-weighting.

There's nothing like that built into Solr - the best way to model that
would probably be to use the term frequency to represent the weight - you
could have an analyzer that parsed input like...

   "blue state"^2 "democrat"^1 "john kerry"^5

...and converted it into a stream of tokens like...

   [blue state] [blue state] [democrat] [john kerry] [john kerry]...

..kind of kludgy, but that's the best mechanism Lucene has at the moment
(there are plans to add more generic term attributes, but that's still
currently a design thing)

: can you do sum(*) type queries in lucene/solr? it is efficient ? or
: are you better having a 2nd index which has these sum(*) values in it
: and keep it up to date instead.

sum's across multiple documents, or sums of values in a single document?
in the later case, you don't need a seperate index, just another field.
in the former case it's really a question of what sets of documents you
want sums across? .. if it's all of them then you could just store that
info in a flat file, or a special metadata document in your index ..

if what you want is more of a run-time calculation then you can certainly
do it in a custom request handler (and you can use a SolrCache and a
custom CacheRegnirator to make sure the values are cached for as long as
the searcher is open, and autowarmed when a new one is opened).  Generally
the best way to do math operations on sets of documents in Lucene is using
the FieldCache, and this is certainly available to Lucene request
handlers.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: newbie Q regarding schema configuration

Yonik Seeley
In reply to this post by Ian Holsman (Lists)
> can you have complex types which are multivalued?
> I'd like to store something like
> a tag-name with a corresponding tag-weighting.

How much work it is might depend on how static or dynamic the tag-weighting is.
If it's very static, you could simply use index-time boosts.

> can you do sum(*) type queries in lucene/solr? it is efficient ?

If all tag weights were the same, you would get summing for "free" via
lucene scoring I think...

It all depends on the exact details of what you are trying to do, how
many tags, how are the weights calculated, are the sums across all
tags or dynamically determined by some query, etc...

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: newbie Q regarding schema configuration

Ian Holsman (Lists)
In reply to this post by Chris Hostetter-3
thanks for the input Chris (and Yonik)

i'm not sure lucene is the best answer for what I want to do ;(

regards
Ian
On 20/06/2006, at 5:58 PM, Chris Hostetter wrote:

>
> : so.. my first question in schema.xml, can you have a composite  
> key as
> : the 'uniquekey' field, or do i need to do this on the client side?
>
> at the moment this would need to be done client site, but you're  
> not the
> first person to ask so i've added it to the TaskList ... it doesn't  
> seem
> like it would be too hard.
>
> : can you have complex types which are multivalued?
> : I'd like to store something like
> : a tag-name with a corresponding tag-weighting.
>
> There's nothing like that built into Solr - the best way to model that
> would probably be to use the term frequency to represent the weight  
> - you
> could have an analyzer that parsed input like...
>
>    "blue state"^2 "democrat"^1 "john kerry"^5
>
> ...and converted it into a stream of tokens like...
>
>    [blue state] [blue state] [democrat] [john kerry] [john kerry]...
>
> ..kind of kludgy, but that's the best mechanism Lucene has at the  
> moment
> (there are plans to add more generic term attributes, but that's still
> currently a design thing)
>
> : can you do sum(*) type queries in lucene/solr? it is efficient ? or
> : are you better having a 2nd index which has these sum(*) values  
> in it
> : and keep it up to date instead.
>
> sum's across multiple documents, or sums of values in a single  
> document?
> in the later case, you don't need a seperate index, just another  
> field.
> in the former case it's really a question of what sets of documents  
> you
> want sums across? .. if it's all of them then you could just store  
> that
> info in a flat file, or a special metadata document in your index ..
>
> if what you want is more of a run-time calculation then you can  
> certainly
> do it in a custom request handler (and you can use a SolrCache and a
> custom CacheRegnirator to make sure the values are cached for as  
> long as
> the searcher is open, and autowarmed when a new one is opened).  
> Generally
> the best way to do math operations on sets of documents in Lucene  
> is using
> the FieldCache, and this is certainly available to Lucene request
> handlers.
>
>
>
> -Hoss
>