Item Search Database

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Item Search Database

Maarten.De.Vilder
hi,

i have a performance question...

we need to implement a feature called 'Item Search Database', which
basically means we have to limit the documents a user can search ...

example :
Item1 is in database1
item2 is in database2
item3 is in database1 and database2
and the client can only see the items in database1

we currently solve this by making a new solrcolumn for each
searchdatabase... so it looks like this :
ITEMNAME        DB1     DB2
-----------------       ------  ------
Item1           true    false
Item2           false   true
Item3           true    true

and we limit the result of a search by putting "db1:true" in the
querystring

but i have been reading about another method :
we could also use just one solrcolum and put the names of the database in
it...
like so :
ITEMNAME        DB
-----------------       -----
Item1           DB1
Item2           DB2
Item3           DB1 DB2

and limit the results by putting 'db:db1' in the querystring

and now for my question :
which of these options will be more performant ?

my guess is the first option will be the most performant since the indexes
will be better constructed
but i would really like a professional opinion on this ...

as i said, we are currently using the first option on 300.000 testrecords
and it is really performant.
some SearchDatabases have only 12 records in it and it takes less then 1ms
to get those 12 records back... so i'm guessing Solr is not searching the
full 300.000 records and i am kind of afraid that with the second option
Solr will have to search more records/indexes to get the same result...

well, hope you understand my question and thanks in advance !
- Maarten

PS: thank you to everybody on this list for the help and thank you to all
of the Solr/Lucene developers, great stuff !!
Reply | Threaded
Open this post in threaded view
|

Re: Item Search Database

Yonik Seeley-2
On 3/28/07, [hidden email] <[hidden email]> wrote:

> we need to implement a feature called 'Item Search Database', which
> basically means we have to limit the documents a user can search ...
>
> example :
> Item1 is in database1
> item2 is in database2
> item3 is in database1 and database2
> and the client can only see the items in database1
>
> we currently solve this by making a new solrcolumn for each
> searchdatabase... so it looks like this :
> ITEMNAME        DB1     DB2
> -----------------       ------  ------
> Item1           true    false
> Item2           false   true
> Item3           true    true
>
> and we limit the result of a search by putting "db1:true" in the
> querystring
>
> but i have been reading about another method :
> we could also use just one solrcolum and put the names of the database in
> it...
> like so :
> ITEMNAME        DB
> -----------------       -----
> Item1           DB1
> Item2           DB2
> Item3           DB1 DB2
>
> and limit the results by putting 'db:db1' in the querystring
>
> and now for my question :
> which of these options will be more performant ?

They should both be roughly equal.
Lucene maintains an inverted index... a term points to all the
document id's containing that term... so it doesn't really matter if
the term is "db:db1" or "db1:true".

The 2nd way with a single field seems more extensible and future-proof though.

If you really want a speedup, pull out the restriction into a filter:

q=foo&fq=db:db1

The filter will be cached independently of the query, resulting in
much less work for every subsequent query that reuses that filter.

-Yonik

> my guess is the first option will be the most performant since the indexes
> will be better constructed
> but i would really like a professional opinion on this ...
>
> as i said, we are currently using the first option on 300.000 testrecords
> and it is really performant.
> some SearchDatabases have only 12 records in it and it takes less then 1ms
> to get those 12 records back... so i'm guessing Solr is not searching the
> full 300.000 records and i am kind of afraid that with the second option
> Solr will have to search more records/indexes to get the same result...
>
> well, hope you understand my question and thanks in advance !
> - Maarten
>
> PS: thank you to everybody on this list for the help and thank you to all
> of the Solr/Lucene developers, great stuff !!