How to find first document for the ALL search

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to find first document for the ALL search

Ian Connor
I have found that this search crashes:

/solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id

SEVERE: java.lang.IndexOutOfBoundsException: Index: 114, Size: 90
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288)
    at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:217)
    at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948)
    at
org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506)
    at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:259)

but this one works:

/solr/select?q=*%3A*&fq=&start=1&rows=1&fl=id

It looks like just that first document is bad. I am happy to delete it - but
not sure how to get to it. Does anyone know how to find it?

- Ian
Reply | Threaded
Open this post in threaded view
|

range faceting with integers

Jonathan Rochkind
So I want to provide some range facets with an integer (probably tint,
that is trie field with non-0 precision) solr field.

It's clear enough how to do this, along the lines of facet.query=[1 TO
100]&facet.query=[101 TO 200]&facet.query=[201 TO 300]

etc.

The issue is that I'd like to calculate N equal ranges based on the min
and max value found in the field.

I can't think of any way to do this that doesn't require two querries --
one to get the min and max (within the current search set), then
calculate the ranges client-side (possibly making the boundaries 'nice'
numbers instead of strictly equal ranges), then do another query with
the calculated facet.queries set.

Is there any other trick I'm missing here?  If there were date values,
you could possibly use facet.date.gap, although I'm not even sure if
that works without explicitly setting the facet.date.start, not sure if
you can leave facet.date.start unset meaning "the minimum value in the
field" or not.  But I'm not dealing with dates here anyway, but with
integers.

So anything I'm missing, or just have the client do two queries?   For
that matter, is there an easy way to ask for minimum and maximum values
in a field, within a result set?

Thanks for any advice,
Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: How to find first document for the ALL search

Chris Hostetter-3
In reply to this post by Ian Connor

: I have found that this search crashes:
:
: /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id

Ouch .. that exception is kind of hairy.  it suggests that your index may
have been corrupted in some way -- do you have nay idea what happened?  
have you tried using hte CheckIndex tool to see what it says?

(I'd hate to help you workd arround this but get bit by a timebomb of some
other bad docs later)

: It looks like just that first document is bad. I am happy to delete it - but
: not sure how to get to it. Does anyone know how to find it?

CheckIndexes might help ... if it doesn't the next thing you might try is
asking for a legitimate field name that you know no document has (ie: if
you have a dynamicField with the pattern "str_*" because you have fields
like "str_foo" and "str_bar" but you never have fields named
"str____BOGUS" then use fl=str____BOGUS) and then add debugQuery=true to
the URL -- the debug info should contain the id.

I'll be honest thought: i'm guessing that if your example query doesn't
work, by suggestion won't either -- because if you get that error just
trying to access the "id" field, the same thing will probably happen when
the debugComponent tries to look at up as well.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: range faceting with integers

Chris Hostetter-3
In reply to this post by Jonathan Rochkind

: Subject: range faceting with integers
: References: <[hidden email]>
: In-Reply-To: <[hidden email]>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: How to find first document for the ALL search

Ian Connor
In reply to this post by Chris Hostetter-3
Hi,

The good news is that:

/solr/select?q=*%3A*&fq=&start=1&rows=1&fl=id

did work (kind of odd really) so I am reading all the documents from the bad
one to a new solr using using the same configuration using ruby (complete
rebuild).

so far so good - it is gone through 500k out of 1.7M and seems to be the
best I could think of.

Running the luke tool and trying to check the index on a copy ended up
destroying the index and leaving only about 5k documents left. Reading them
out via ruby seemed better in this case (and less work than restoring from
backup and re running a few days transactions to catch it up).

Ian.


On Wed, Jul 14, 2010 at 9:22 PM, Chris Hostetter
<[hidden email]>wrote:

>
> : I have found that this search crashes:
> :
> : /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id
>
> Ouch .. that exception is kind of hairy.  it suggests that your index may
> have been corrupted in some way -- do you have nay idea what happened?
> have you tried using hte CheckIndex tool to see what it says?
>
> (I'd hate to help you workd arround this but get bit by a timebomb of some
> other bad docs later)
>
> : It looks like just that first document is bad. I am happy to delete it -
> but
> : not sure how to get to it. Does anyone know how to find it?
>
> CheckIndexes might help ... if it doesn't the next thing you might try is
> asking for a legitimate field name that you know no document has (ie: if
> you have a dynamicField with the pattern "str_*" because you have fields
> like "str_foo" and "str_bar" but you never have fields named
> "str____BOGUS" then use fl=str____BOGUS) and then add debugQuery=true to
> the URL -- the debug info should contain the id.
>
> I'll be honest thought: i'm guessing that if your example query doesn't
> work, by suggestion won't either -- because if you get that error just
> trying to access the "id" field, the same thing will probably happen when
> the debugComponent tries to look at up as well.
>
>
>
> -Hoss
>
>


--
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor