Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

Justin Grunau
We have some code that uses lucene which has been working perfectly well for several months.

Recently, a QA team in our organization has set up a server with a much larger data set than we have ever tested with in the past:  the resulting lucene index is about 3G in size.

On this particular server, the same lucene code which has been reliable in the past is now exhibiting erratic behavior.  The first time you do a search, it returns the correct number of hits.  The second time you do a search, it may or may not return the correct set.  By the third time you do a search, it will return 0 hits even for a search that was returning hundreds of hits only a few seconds earlier.  All subsequent searches will return 0 hits until you stop and restart the java process.

A snippet of the relevant code follows:

                    // getReader() returns the singleton IndexReader object
                final IndexReader reader = getReader();

                    // ANALYZER is another singleton
                final QueryParser queryParser = new QueryParser("text", ANALYZER);
                queryParser.setDefaultOperator(spec.getDefaultOp());
                final Query query = queryParser.parse(spec.getSearchText()).rewrite(
                        reader);
                final IndexSearcher searcher = new IndexSearcher(reader);

                final Hits hits = searcher.search(query, new CachingWrapperFilter(
                        new QueryWrapperFilter(visibilityFilter)));
                total = hits.length();



I understand that Lucene should be able to handle very large datasets, so I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just that I'm doing something "wrong" which has gone unnoticed so far for several months because we've never had an index this large.

We're using lucene verison 2.2.0.

Thanks!

Justin Grunau



     


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

leonardinius
* And what's about visibility filter? * Are you sure no one else accesses
IndexReader and modifies index? See reader.maxDocs() to be confident.

On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[hidden email]> wrote:

> We have some code that uses lucene which has been working perfectly well
> for several months.
>
> Recently, a QA team in our organization has set up a server with a much
> larger data set than we have ever tested with in the past:  the resulting
> lucene index is about 3G in size.
>
> On this particular server, the same lucene code which has been reliable in
> the past is now exhibiting erratic behavior.  The first time you do a
> search, it returns the correct number of hits.  The second time you do a
> search, it may or may not return the correct set.  By the third time you do
> a search, it will return 0 hits even for a search that was returning
> hundreds of hits only a few seconds earlier.  All subsequent searches will
> return 0 hits until you stop and restart the java process.
>
> A snippet of the relevant code follows:
>
>                    // getReader() returns the singleton IndexReader object
>                final IndexReader reader = getReader();
>
>                    // ANALYZER is another singleton
>                final QueryParser queryParser = new QueryParser("text",
> ANALYZER);
>                queryParser.setDefaultOperator(spec.getDefaultOp());
>                final Query query =
> queryParser.parse(spec.getSearchText()).rewrite(
>                        reader);
>                final IndexSearcher searcher = new IndexSearcher(reader);
>
>                final Hits hits = searcher.search(query, new
> CachingWrapperFilter(
>                        new QueryWrapperFilter(visibilityFilter)));
>                total = hits.length();
>
>
>
> I understand that Lucene should be able to handle very large datasets, so
> I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just
> that I'm doing something "wrong" which has gone unnoticed so far for several
> months because we've never had an index this large.
>
> We're using lucene verison 2.2.0.
>
> Thanks!
>
> Justin Grunau
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
 Bests regards,
 Leonid Maslov!
 Personal blog: http://leonardinius.blogspot.com/

Random thought:
Princess Margaret  - "I have as much privacy as a goldfish in a bowl."
Reply | Threaded
Open this post in threaded view
|

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

Justin Grunau
In reply to this post by Justin Grunau
Sorry, I forgot to include the visibility filters:

                final BooleanQuery visibilityFilter = new BooleanQuery();
                visibilityFilter.add(new TermQuery(new Term("isPublic", "true")),
                        Occur.SHOULD);
                visibilityFilter.add(new TermQuery(new Term("reader", user.getId())),
                        Occur.SHOULD);


These visibility filters ensure that a user only sees files which he or she has access to see.

I am pretty certain nobody else has modified the index in the meantime, but why is that important?  We have several other servers -- whose only difference is a smaller data set -- with dozens of concurrent users, and the index on those servers gets modified and read concurrently all the time, but none of these other servers have ever exhibited this bug.



----- Original Message ----
From: Leonid M. <[hidden email]>
To: [hidden email]
Sent: Thursday, September 4, 2008 5:35:47 PM
Subject: Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

* And what's about visibility filter? * Are you sure no one else accesses
IndexReader and modifies index? See reader.maxDocs() to be confident.

On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[hidden email]> wrote:

> We have some code that uses lucene which has been working perfectly well
> for several months.
>
> Recently, a QA team in our organization has set up a server with a much
> larger data set than we have ever tested with in the past:  the resulting
> lucene index is about 3G in size.
>
> On this particular server, the same lucene code which has been reliable in
> the past is now exhibiting erratic behavior.  The first time you do a
> search, it returns the correct number of hits.  The second time you do a
> search, it may or may not return the correct set.  By the third time you do
> a search, it will return 0 hits even for a search that was returning
> hundreds of hits only a few seconds earlier.  All subsequent searches will
> return 0 hits until you stop and restart the java process.
>
> A snippet of the relevant code follows:
>
>                    // getReader() returns the singleton IndexReader object
>                final IndexReader reader = getReader();
>
>                    // ANALYZER is another singleton
>                final QueryParser queryParser = new QueryParser("text",
> ANALYZER);
>                queryParser.setDefaultOperator(spec.getDefaultOp());
>                final Query query =
> queryParser.parse(spec.getSearchText()).rewrite(
>                        reader);
>                final IndexSearcher searcher = new IndexSearcher(reader);
>
>                final Hits hits = searcher.search(query, new
> CachingWrapperFilter(
>                        new QueryWrapperFilter(visibilityFilter)));
>                total = hits.length();
>
>
>
> I understand that Lucene should be able to handle very large datasets, so
> I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just
> that I'm doing something "wrong" which has gone unnoticed so far for several
> months because we've never had an index this large.
>
> We're using lucene verison 2.2.0.
>
> Thanks!
>
> Justin Grunau
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Bests regards,
Leonid Maslov!
Personal blog: http://leonardinius.blogspot.com/

Random thought:
Princess Margaret  - "I have as much privacy as a goldfish in a bowl."



     


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

leonardinius
Anyway it is worth trying (to ensure docs aren't removed between searches).What
if running MatchAllDocsQuery or smth similar? Still getting different hits
count on query rerun?

PS. I'm kinda newbie with Lucene and Lucene API. So don't take my notes too
seriously :)

On Fri, Sep 5, 2008 at 12:46 AM, Justin Grunau <[hidden email]> wrote:

> Sorry, I forgot to include the visibility filters:
>
>                final BooleanQuery visibilityFilter = new BooleanQuery();
>                visibilityFilter.add(new TermQuery(new Term("isPublic",
> "true")),
>                        Occur.SHOULD);
>                visibilityFilter.add(new TermQuery(new Term("reader",
> user.getId())),
>                        Occur.SHOULD);
>
>
> These visibility filters ensure that a user only sees files which he or she
> has access to see.
>
> I am pretty certain nobody else has modified the index in the meantime, but
> why is that important?  We have several other servers -- whose only
> difference is a smaller data set -- with dozens of concurrent users, and the
> index on those servers gets modified and read concurrently all the time, but
> none of these other servers have ever exhibited this bug.
>
>
>
> ----- Original Message ----
> From: Leonid M. <[hidden email]>
> To: [hidden email]
> Sent: Thursday, September 4, 2008 5:35:47 PM
> Subject: Re: Problem with lucene search starting to return 0 hits when a
> few seconds earlier it was returning hundreds
>
> * And what's about visibility filter? * Are you sure no one else accesses
> IndexReader and modifies index? See reader.maxDocs() to be confident.
>
> On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[hidden email]> wrote:
>
> > We have some code that uses lucene which has been working perfectly well
> > for several months.
> >
> > Recently, a QA team in our organization has set up a server with a much
> > larger data set than we have ever tested with in the past:  the resulting
> > lucene index is about 3G in size.
> >
> > On this particular server, the same lucene code which has been reliable
> in
> > the past is now exhibiting erratic behavior.  The first time you do a
> > search, it returns the correct number of hits.  The second time you do a
> > search, it may or may not return the correct set.  By the third time you
> do
> > a search, it will return 0 hits even for a search that was returning
> > hundreds of hits only a few seconds earlier.  All subsequent searches
> will
> > return 0 hits until you stop and restart the java process.
> >
> > A snippet of the relevant code follows:
> >
> >                    // getReader() returns the singleton IndexReader
> object
> >                final IndexReader reader = getReader();
> >
> >                    // ANALYZER is another singleton
> >                final QueryParser queryParser = new QueryParser("text",
> > ANALYZER);
> >                queryParser.setDefaultOperator(spec.getDefaultOp());
> >                final Query query =
> > queryParser.parse(spec.getSearchText()).rewrite(
> >                        reader);
> >                final IndexSearcher searcher = new IndexSearcher(reader);
> >
> >                final Hits hits = searcher.search(query, new
> > CachingWrapperFilter(
> >                        new QueryWrapperFilter(visibilityFilter)));
> >                total = hits.length();
> >
> >
> >
> > I understand that Lucene should be able to handle very large datasets, so
> > I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just
> > that I'm doing something "wrong" which has gone unnoticed so far for
> several
> > months because we've never had an index this large.
> >
> > We're using lucene verison 2.2.0.
> >
> > Thanks!
> >
> > Justin Grunau
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Bests regards,
> Leonid Maslov!
> Personal blog: http://leonardinius.blogspot.com/
>
> Random thought:
> Princess Margaret  - "I have as much privacy as a goldfish in a bowl."
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Bests regards,
Leonid Maslov!
Personal blog: http://leonardinius.blogspot.com/

Random thought:
John Belushi  - "I owe it all to little chocolate donuts."
Reply | Threaded
Open this post in threaded view
|

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

Erick Erickson
In reply to this post by Justin Grunau
I've been tracking this list for a year or more, and this is the
first I've ever heard of such a thing. Which leads me to wonder
what *else* changed besides your index size. Classpath?
jar files? Some sysadmin modified your search box? Is the
program throwing an exception that you're masking somewhere
in the code? Is it possible that you're getting an Out Of
Memory exception?

Folks have routinely used MUCH larger indexes than 3G without
anything like this happening.

So, here's what I'd do.
1> verify your index. Look at it in, say Luke.
2> Log your queries. It's possible you're not
     querying what you think you are.
3> You should very easily be able to create a small,
     stupid program on your personal machine that will
     open this index and fire off the queries in question
     and see if your problem is environmental or programmatic.
     If you run it in an IDE, you should be able to break
     on exceptions if there are any.
4> Assuming that <2> exhibits the problem, start paring
     back your code. Take out one thing at a time until
     you don't see the problem.
5> really take a look at any code changes that are
     coincident with this anomaly. Are you totally sure
     that the only thing that's changed is the index?
6> why are you bothering to make everything final? Are
    your code snippets part of a class that's instantiated
    for each query? Note that this is more curiosity than
    thinking that it's the source of your problem <G>.


Best
Erick



On Thu, Sep 4, 2008 at 5:46 PM, Justin Grunau <[hidden email]> wrote:

> Sorry, I forgot to include the visibility filters:
>
>                final BooleanQuery visibilityFilter = new BooleanQuery();
>                visibilityFilter.add(new TermQuery(new Term("isPublic",
> "true")),
>                        Occur.SHOULD);
>                visibilityFilter.add(new TermQuery(new Term("reader",
> user.getId())),
>                        Occur.SHOULD);
>
>
> These visibility filters ensure that a user only sees files which he or she
> has access to see.
>
> I am pretty certain nobody else has modified the index in the meantime, but
> why is that important?  We have several other servers -- whose only
> difference is a smaller data set -- with dozens of concurrent users, and the
> index on those servers gets modified and read concurrently all the time, but
> none of these other servers have ever exhibited this bug.
>
>
>
> ----- Original Message ----
> From: Leonid M. <[hidden email]>
> To: [hidden email]
> Sent: Thursday, September 4, 2008 5:35:47 PM
> Subject: Re: Problem with lucene search starting to return 0 hits when a
> few seconds earlier it was returning hundreds
>
> * And what's about visibility filter? * Are you sure no one else accesses
> IndexReader and modifies index? See reader.maxDocs() to be confident.
>
> On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[hidden email]> wrote:
>
> > We have some code that uses lucene which has been working perfectly well
> > for several months.
> >
> > Recently, a QA team in our organization has set up a server with a much
> > larger data set than we have ever tested with in the past:  the resulting
> > lucene index is about 3G in size.
> >
> > On this particular server, the same lucene code which has been reliable
> in
> > the past is now exhibiting erratic behavior.  The first time you do a
> > search, it returns the correct number of hits.  The second time you do a
> > search, it may or may not return the correct set.  By the third time you
> do
> > a search, it will return 0 hits even for a search that was returning
> > hundreds of hits only a few seconds earlier.  All subsequent searches
> will
> > return 0 hits until you stop and restart the java process.
> >
> > A snippet of the relevant code follows:
> >
> >                    // getReader() returns the singleton IndexReader
> object
> >                final IndexReader reader = getReader();
> >
> >                    // ANALYZER is another singleton
> >                final QueryParser queryParser = new QueryParser("text",
> > ANALYZER);
> >                queryParser.setDefaultOperator(spec.getDefaultOp());
> >                final Query query =
> > queryParser.parse(spec.getSearchText()).rewrite(
> >                        reader);
> >                final IndexSearcher searcher = new IndexSearcher(reader);
> >
> >                final Hits hits = searcher.search(query, new
> > CachingWrapperFilter(
> >                        new QueryWrapperFilter(visibilityFilter)));
> >                total = hits.length();
> >
> >
> >
> > I understand that Lucene should be able to handle very large datasets, so
> > I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just
> > that I'm doing something "wrong" which has gone unnoticed so far for
> several
> > months because we've never had an index this large.
> >
> > We're using lucene verison 2.2.0.
> >
> > Thanks!
> >
> > Justin Grunau
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Bests regards,
> Leonid Maslov!
> Personal blog: http://leonardinius.blogspot.com/
>
> Random thought:
> Princess Margaret  - "I have as much privacy as a goldfish in a bowl."
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

叶双明
 Have you ignore any Exception during query?  make sure of it.

2008/9/5, Erick Erickson <[hidden email]>:

>
> I've been tracking this list for a year or more, and this is the
> first I've ever heard of such a thing. Which leads me to wonder
> what *else* changed besides your index size. Classpath?
> jar files? Some sysadmin modified your search box? Is the
> program throwing an exception that you're masking somewhere
> in the code? Is it possible that you're getting an Out Of
> Memory exception?
>
> Folks have routinely used MUCH larger indexes than 3G without
> anything like this happening.
>
> So, here's what I'd do.
> 1> verify your index. Look at it in, say Luke.
> 2> Log your queries. It's possible you're not
>     querying what you think you are.
> 3> You should very easily be able to create a small,
>     stupid program on your personal machine that will
>     open this index and fire off the queries in question
>     and see if your problem is environmental or programmatic.
>     If you run it in an IDE, you should be able to break
>     on exceptions if there are any.
> 4> Assuming that <2> exhibits the problem, start paring
>     back your code. Take out one thing at a time until
>     you don't see the problem.
> 5> really take a look at any code changes that are
>     coincident with this anomaly. Are you totally sure
>     that the only thing that's changed is the index?
> 6> why are you bothering to make everything final? Are
>    your code snippets part of a class that's instantiated
>    for each query? Note that this is more curiosity than
>    thinking that it's the source of your problem <G>.
>
>
> Best
> Erick
>
>
>
> On Thu, Sep 4, 2008 at 5:46 PM, Justin Grunau <[hidden email]> wrote:
>
> > Sorry, I forgot to include the visibility filters:
> >
> >                final BooleanQuery visibilityFilter = new BooleanQuery();
> >                visibilityFilter.add(new TermQuery(new Term("isPublic",
> > "true")),
> >                        Occur.SHOULD);
> >                visibilityFilter.add(new TermQuery(new Term("reader",
> > user.getId())),
> >                        Occur.SHOULD);
> >
> >
> > These visibility filters ensure that a user only sees files which he or
> she
> > has access to see.
> >
> > I am pretty certain nobody else has modified the index in the meantime,
> but
> > why is that important?  We have several other servers -- whose only
> > difference is a smaller data set -- with dozens of concurrent users, and
> the
> > index on those servers gets modified and read concurrently all the time,
> but
> > none of these other servers have ever exhibited this bug.
> >
> >
> >
> > ----- Original Message ----
> > From: Leonid M. <[hidden email]>
> > To: [hidden email]
> > Sent: Thursday, September 4, 2008 5:35:47 PM
> > Subject: Re: Problem with lucene search starting to return 0 hits when a
> > few seconds earlier it was returning hundreds
> >
> > * And what's about visibility filter? * Are you sure no one else accesses
> > IndexReader and modifies index? See reader.maxDocs() to be confident.
> >
> > On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[hidden email]> wrote:
> >
> > > We have some code that uses lucene which has been working perfectly
> well
> > > for several months.
> > >
> > > Recently, a QA team in our organization has set up a server with a much
> > > larger data set than we have ever tested with in the past:  the
> resulting
> > > lucene index is about 3G in size.
> > >
> > > On this particular server, the same lucene code which has been reliable
> > in
> > > the past is now exhibiting erratic behavior.  The first time you do a
> > > search, it returns the correct number of hits.  The second time you do
> a
> > > search, it may or may not return the correct set.  By the third time
> you
> > do
> > > a search, it will return 0 hits even for a search that was returning
> > > hundreds of hits only a few seconds earlier.  All subsequent searches
> > will
> > > return 0 hits until you stop and restart the java process.
> > >
> > > A snippet of the relevant code follows:
> > >
> > >                    // getReader() returns the singleton IndexReader
> > object
> > >                final IndexReader reader = getReader();
> > >
> > >                    // ANALYZER is another singleton
> > >                final QueryParser queryParser = new QueryParser("text",
> > > ANALYZER);
> > >                queryParser.setDefaultOperator(spec.getDefaultOp());
> > >                final Query query =
> > > queryParser.parse(spec.getSearchText()).rewrite(
> > >                        reader);
> > >                final IndexSearcher searcher = new
> IndexSearcher(reader);
> > >
> > >                final Hits hits = searcher.search(query, new
> > > CachingWrapperFilter(
> > >                        new QueryWrapperFilter(visibilityFilter)));
> > >                total = hits.length();
> > >
> > >
> > >
> > > I understand that Lucene should be able to handle very large datasets,
> so
> > > I'd be surprised if this were an actual Lucene bug.  I'm hoping it's
> just
> > > that I'm doing something "wrong" which has gone unnoticed so far for
> > several
> > > months because we've never had an index this large.
> > >
> > > We're using lucene verison 2.2.0.
> > >
> > > Thanks!
> > >
> > > Justin Grunau
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> > --
> > Bests regards,
> > Leonid Maslov!
> > Personal blog: http://leonardinius.blogspot.com/
> >
> > Random thought:
> > Princess Margaret  - "I have as much privacy as a goldfish in a bowl."
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>



--
Sorry for my englist!! 明