Quantcast

Searcher Performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Searcher Performance

Chitra R
Hi,
     While working with Searcher.Search, I have noticed a difference in
their performance. I have 10 lakh documents and 30 fields in my index. I
have performed three searches using different queries in a sequential
manner. At search time, I used MMapDirectory and index is opened.

*case1: *

   - During the first search, I ran the Query Say (new TermQuery(new
   Term("name","Chitra"))) and which yields 1 lakh documents as result. Time
   taken for first search = 50 - 60 ms nearly.
   - And for the second search, I ran the Query Say (new TermQuery(new
   Term("animal","lion"))) which also yields 1 lakh documents as result.  Time
   taken for Second search = 50 - 60 ms nearly.
   - And for the third search,  I ran the Query Say (new TermQuery(new
   Term("bird","peacock"))) which also yields 1 lakh documents as result.
   Time taken for Third search = 50 - 60 ms nearly.

In this case, why does searcher.search take the same search time for
different queries?

*case2:*

Suppose if I ran the same query twice, Searcher.search took less time than
the previous search because of os cache.

*Based on above observation, *

During initial search, only the required portion of index files will be
loaded to i/o cache. And for the next search, if the required portion is
not present in os cache,

Will it take time to read that files from disk? If so, this is the reason
behind searcher.search is taking the nearly same search time for different
queries.


Regards,
Chitra
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searcher Performance

Adrien Grand
Regarding whether the filesystem cache helps, you could look at whether
there is some disk activity while your queries are running.

When everything is in the filesystem cache, the latency of search requests
for simple queries (term queries and combinations through boolean queries)
usually mostly depends on the total number of matches since Lucene needs to
call the collector on every match.

Le ven. 17 févr. 2017 à 10:09, Chitra R <[hidden email]> a écrit :

> Hi,
>      While working with Searcher.Search, I have noticed a difference in
> their performance. I have 10 lakh documents and 30 fields in my index. I
> have performed three searches using different queries in a sequential
> manner. At search time, I used MMapDirectory and index is opened.
>
> *case1: *
>
>    - During the first search, I ran the Query Say (new TermQuery(new
>    Term("name","Chitra"))) and which yields 1 lakh documents as result.
> Time
>    taken for first search = 50 - 60 ms nearly.
>    - And for the second search, I ran the Query Say (new TermQuery(new
>    Term("animal","lion"))) which also yields 1 lakh documents as result.
> Time
>    taken for Second search = 50 - 60 ms nearly.
>    - And for the third search,  I ran the Query Say (new TermQuery(new
>    Term("bird","peacock"))) which also yields 1 lakh documents as result.
>    Time taken for Third search = 50 - 60 ms nearly.
>
> In this case, why does searcher.search take the same search time for
> different queries?
>
> *case2:*
>
> Suppose if I ran the same query twice, Searcher.search took less time than
> the previous search because of os cache.
>
> *Based on above observation, *
>
> During initial search, only the required portion of index files will be
> loaded to i/o cache. And for the next search, if the required portion is
> not present in os cache,
>
> Will it take time to read that files from disk? If so, this is the reason
> behind searcher.search is taking the nearly same search time for different
> queries.
>
>
> Regards,
> Chitra
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searcher Performance

Chitra R
Hey, thank you so much. I got it.

I have

   - 10 lakh docs, 30 fields in my index
   - opening new searcher at initial search and
   - there will be no filesystem cache for my current index

At initial search, I search across only one field out of 30 fields in my
index.

My question is,

*At initial search, Whether the required page (os pages of Lucene index
files) for that field (a single field) will be loaded to filesystem cache
or all the fields info will be loaded to filesystem cache from disk?*


Regards,
Chitra

On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand <[hidden email]> wrote:

> Regarding whether the filesystem cache helps, you could look at whether
> there is some disk activity while your queries are running.
>
> When everything is in the filesystem cache, the latency of search requests
> for simple queries (term queries and combinations through boolean queries)
> usually mostly depends on the total number of matches since Lucene needs to
> call the collector on every match.
>
> Le ven. 17 févr. 2017 à 10:09, Chitra R <[hidden email]> a écrit :
>
> > Hi,
> >      While working with Searcher.Search, I have noticed a difference in
> > their performance. I have 10 lakh documents and 30 fields in my index. I
> > have performed three searches using different queries in a sequential
> > manner. At search time, I used MMapDirectory and index is opened.
> >
> > *case1: *
> >
> >    - During the first search, I ran the Query Say (new TermQuery(new
> >    Term("name","Chitra"))) and which yields 1 lakh documents as result.
> > Time
> >    taken for first search = 50 - 60 ms nearly.
> >    - And for the second search, I ran the Query Say (new TermQuery(new
> >    Term("animal","lion"))) which also yields 1 lakh documents as result.
> > Time
> >    taken for Second search = 50 - 60 ms nearly.
> >    - And for the third search,  I ran the Query Say (new TermQuery(new
> >    Term("bird","peacock"))) which also yields 1 lakh documents as result.
> >    Time taken for Third search = 50 - 60 ms nearly.
> >
> > In this case, why does searcher.search take the same search time for
> > different queries?
> >
> > *case2:*
> >
> > Suppose if I ran the same query twice, Searcher.search took less time
> than
> > the previous search because of os cache.
> >
> > *Based on above observation, *
> >
> > During initial search, only the required portion of index files will be
> > loaded to i/o cache. And for the next search, if the required portion is
> > not present in os cache,
> >
> > Will it take time to read that files from disk? If so, this is the reason
> > behind searcher.search is taking the nearly same search time for
> different
> > queries.
> >
> >
> > Regards,
> > Chitra
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searcher Performance

Adrien Grand
Some minimal information about the fields is loaded into memory when you
open the index reader. Things like the list of fields and how they are
indexed.

However the vast majority of the data is read from disk lazily, we do not
warm the filesystem cache or anything like that by default. We do not use
direct I/O either. So say you run a term query, only pages that contain
information about these particular field and value will be loaded into the
cache.

In case you want to warm the filesystem cache explicitly, which could be a
good idea if you have plenty of filesystem cache for your index (ie. the
unused memory of the system is larger than the index), you can look into
using MMapDirectory.setPreload.

Le ven. 17 févr. 2017 à 15:13, Chitra R <[hidden email]> a écrit :

> Hey, thank you so much. I got it.
>
> I have
>
>    - 10 lakh docs, 30 fields in my index
>    - opening new searcher at initial search and
>    - there will be no filesystem cache for my current index
>
> At initial search, I search across only one field out of 30 fields in my
> index.
>
> My question is,
>
> *At initial search, Whether the required page (os pages of Lucene index
> files) for that field (a single field) will be loaded to filesystem cache
> or all the fields info will be loaded to filesystem cache from disk?*
>
>
> Regards,
> Chitra
>
> On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand <[hidden email]> wrote:
>
> > Regarding whether the filesystem cache helps, you could look at whether
> > there is some disk activity while your queries are running.
> >
> > When everything is in the filesystem cache, the latency of search
> requests
> > for simple queries (term queries and combinations through boolean
> queries)
> > usually mostly depends on the total number of matches since Lucene needs
> to
> > call the collector on every match.
> >
> > Le ven. 17 févr. 2017 à 10:09, Chitra R <[hidden email]> a écrit
> :
> >
> > > Hi,
> > >      While working with Searcher.Search, I have noticed a difference in
> > > their performance. I have 10 lakh documents and 30 fields in my index.
> I
> > > have performed three searches using different queries in a sequential
> > > manner. At search time, I used MMapDirectory and index is opened.
> > >
> > > *case1: *
> > >
> > >    - During the first search, I ran the Query Say (new TermQuery(new
> > >    Term("name","Chitra"))) and which yields 1 lakh documents as result.
> > > Time
> > >    taken for first search = 50 - 60 ms nearly.
> > >    - And for the second search, I ran the Query Say (new TermQuery(new
> > >    Term("animal","lion"))) which also yields 1 lakh documents as
> result.
> > > Time
> > >    taken for Second search = 50 - 60 ms nearly.
> > >    - And for the third search,  I ran the Query Say (new TermQuery(new
> > >    Term("bird","peacock"))) which also yields 1 lakh documents as
> result.
> > >    Time taken for Third search = 50 - 60 ms nearly.
> > >
> > > In this case, why does searcher.search take the same search time for
> > > different queries?
> > >
> > > *case2:*
> > >
> > > Suppose if I ran the same query twice, Searcher.search took less time
> > than
> > > the previous search because of os cache.
> > >
> > > *Based on above observation, *
> > >
> > > During initial search, only the required portion of index files will be
> > > loaded to i/o cache. And for the next search, if the required portion
> is
> > > not present in os cache,
> > >
> > > Will it take time to read that files from disk? If so, this is the
> reason
> > > behind searcher.search is taking the nearly same search time for
> > different
> > > queries.
> > >
> > >
> > > Regards,
> > > Chitra
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searcher Performance

Chitra R
Thanx a lot Adrien.

On Fri, Feb 17, 2017 at 10:07 PM, Adrien Grand <[hidden email]> wrote:

> Some minimal information about the fields is loaded into memory when you
> open the index reader. Things like the list of fields and how they are
> indexed.
>
> However the vast majority of the data is read from disk lazily, we do not
> warm the filesystem cache or anything like that by default. We do not use
> direct I/O either. So say you run a term query, only pages that contain
> information about these particular field and value will be loaded into the
> cache.
>
> In case you want to warm the filesystem cache explicitly, which could be a
> good idea if you have plenty of filesystem cache for your index (ie. the
> unused memory of the system is larger than the index), you can look into
> using MMapDirectory.setPreload.
>
> Le ven. 17 févr. 2017 à 15:13, Chitra R <[hidden email]> a écrit :
>
> > Hey, thank you so much. I got it.
> >
> > I have
> >
> >    - 10 lakh docs, 30 fields in my index
> >    - opening new searcher at initial search and
> >    - there will be no filesystem cache for my current index
> >
> > At initial search, I search across only one field out of 30 fields in my
> > index.
> >
> > My question is,
> >
> > *At initial search, Whether the required page (os pages of Lucene index
> > files) for that field (a single field) will be loaded to filesystem cache
> > or all the fields info will be loaded to filesystem cache from disk?*
> >
> >
> > Regards,
> > Chitra
> >
> > On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand <[hidden email]> wrote:
> >
> > > Regarding whether the filesystem cache helps, you could look at whether
> > > there is some disk activity while your queries are running.
> > >
> > > When everything is in the filesystem cache, the latency of search
> > requests
> > > for simple queries (term queries and combinations through boolean
> > queries)
> > > usually mostly depends on the total number of matches since Lucene
> needs
> > to
> > > call the collector on every match.
> > >
> > > Le ven. 17 févr. 2017 à 10:09, Chitra R <[hidden email]> a
> écrit
> > :
> > >
> > > > Hi,
> > > >      While working with Searcher.Search, I have noticed a difference
> in
> > > > their performance. I have 10 lakh documents and 30 fields in my
> index.
> > I
> > > > have performed three searches using different queries in a sequential
> > > > manner. At search time, I used MMapDirectory and index is opened.
> > > >
> > > > *case1: *
> > > >
> > > >    - During the first search, I ran the Query Say (new TermQuery(new
> > > >    Term("name","Chitra"))) and which yields 1 lakh documents as
> result.
> > > > Time
> > > >    taken for first search = 50 - 60 ms nearly.
> > > >    - And for the second search, I ran the Query Say (new
> TermQuery(new
> > > >    Term("animal","lion"))) which also yields 1 lakh documents as
> > result.
> > > > Time
> > > >    taken for Second search = 50 - 60 ms nearly.
> > > >    - And for the third search,  I ran the Query Say (new
> TermQuery(new
> > > >    Term("bird","peacock"))) which also yields 1 lakh documents as
> > result.
> > > >    Time taken for Third search = 50 - 60 ms nearly.
> > > >
> > > > In this case, why does searcher.search take the same search time for
> > > > different queries?
> > > >
> > > > *case2:*
> > > >
> > > > Suppose if I ran the same query twice, Searcher.search took less time
> > > than
> > > > the previous search because of os cache.
> > > >
> > > > *Based on above observation, *
> > > >
> > > > During initial search, only the required portion of index files will
> be
> > > > loaded to i/o cache. And for the next search, if the required portion
> > is
> > > > not present in os cache,
> > > >
> > > > Will it take time to read that files from disk? If so, this is the
> > reason
> > > > behind searcher.search is taking the nearly same search time for
> > > different
> > > > queries.
> > > >
> > > >
> > > > Regards,
> > > > Chitra
> > > >
> > >
> >
>
Loading...