Sorting based on a selling rate

classic Classic list List threaded Threaded
21 messages Options
Reply | Threaded
Open this post in threaded view

Re: SpanRegex speed

Mark Miller-3
Erick Erickson wrote:

> OK, a not very helpful answer, but "of course they're slower, they do
> more
> work" (the span versions). But that's fairly useless, since the
> question is
> really "is it enough slower in my situation that I need to find an
> alternative?". And the only way I know of to answer that question is
> to make
> some tests with the data representing my particular problem......
> Sorry I can't be more help....
> Erick
> On 9/1/06, Mark Miller <[hidden email]> wrote:
>> Erick Erickson wrote:
>> > Let me chime in here on a different note.... before you get happy with
>> > wildcard queries, take a look at the thread "I just don't get
>> > wildcards at
>> > all". There is lots of good info that Erik, Chris and Otis provided
>> me.
>> >
>> > The danger with prefixquery and wildcard query is that they will throw
>> > TooManyClauses exceptions when you start matching a number of terms
>> (the
>> > default is 1024, although you can make this much bigger if memory
>> > allows).
>> > If you're aware of this and it is and will be OK in your app, ignore
>> > this.
>> > But if your index is going to grow significantly, this is a real
>> > problem. I
>> > went with implementing filters with WildCardTermEnum (you could
>> also use
>> > RegexTermEnum) for the wildcard portions of my query. Which has
>> > interesting
>> > implications for spans, we elected to say spans didn't work with
>> > wildcards.
>> >
>> > Anyway, as I said, if you're aware of the TooManyClauses issue and are
>> > sure
>> > it doesn't matter, ignore me. After all, everybody else does <G>.....
>> >
>> >
>> > Best
>> > Erick
>> >
>> >
>> >
>> > On 8/30/06, Mark Miller <[hidden email]> wrote:
>> >>
>> >> Ignore that last question. I see that you said prefix wildcard query
>> and
>> >> not wildcard query. A quick look at the code seems to show it
>> grabbing
>> a
>> >> prefix as well.
>> >>
>> >> Do you think one would be any faster than the other? Should I used
>> >> Wildcardqueries outside of spanqueries and the regexquery inside
>> >> spanqueries or use regex both places?
>> >>
>> >> - Mark
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >>
>> >
>> Thanks a lot for the info Eric. Good stuff to know for sure.
>> I guess the real question I have been trying to spit out is this:
>> Is a span version of any of these searches--fuzzy, wildcard,
>> etc--inherently slower than their non-span brothers. If they have the
>> same limitations and speeds then that is all I am looking for.
>> P.S.
>> I realize I have been screwing up the threading by replying when
>> starting a new topic. I have been alerted and will stop this pernicious
>> activity.
>> - Mark
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
Thanks Eric. Your always more than helpful. The reason I only care that
they are as good as they can be is that I am looking for a general
solution and not one tailored to a particular dataset. This is for a
general query parser. I want to be able to search for wildcard, fuzzies,
etc in a proximity search. mark*off NEAR Bork?on. This may just be a
slow query in general but other search engines appear to offer this, and
they must face similar limitations. So if  a fuzzy search is slow in a
proximity search just because it is slow...I don't mind. If it is slow  
because lucene implements spans in a way that makes wildcard and fuzzies
particularly slow in them...thats what I would like to know. And if that
is the case...someone should make a fuzzy and wildcard that is fast in a
span :)

- Mark

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]