Solr is very slow with term vectors

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr is very slow with term vectors

Vignan Malyala
Hi

I made by custom qparser plugin in Solr for scoring. The plugin only does
cosine similarity of vectors. for each record. Results are fine!

*BUT, Solr response is very slow. It takes around 55 seconds for each
request.*
*How do I make it faster to get my results in ms ?*
*Please respond soon as its lil urgent.*

Note: All my values are stored and indexed. I am not using Solr Cloud.

Regards,
Sai Vignan Malyala
Reply | Threaded
Open this post in threaded view
|

Solr is very slow with term vectors

Vignan Malyala
Hi guys,

I made my custom qparser plugin in Solr for scoring. The plugin only does
cosine similarity of vectors for each record. I use term vectors here.
Results are fine!

BUT, Solr response is very slow with term vectors. It takes around 55
seconds for each request for 1000000 records.
How do I make it faster to get my results in ms ?
Please respond soon as its lil urgent.

Note: All my values are stored and indexed. I am not using Solr Cloud.
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Doug Turnbull
Hi Vignan,

We need to see more details / code of what your query parser plugin does
exactly with term vectors, we can't really help you without more details.
Is it open source? Can you share a minimal example that recreates the
problem?

On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]> wrote:

> Hi guys,
>
> I made my custom qparser plugin in Solr for scoring. The plugin only does
> cosine similarity of vectors for each record. I use term vectors here.
> Results are fine!
>
> BUT, Solr response is very slow with term vectors. It takes around 55
> seconds for each request for 1000000 records.
> How do I make it faster to get my results in ms ?
> Please respond soon as its lil urgent.
>
> Note: All my values are stored and indexed. I am not using Solr Cloud.
>


--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Walter Underwood
tf.idf was invented because cosine similarity is too much computation. tf.idf gives similar results much, much faster than cosine distance.

I would expect cosine similarity to be slow. I would also expect retrieving 1 million records to be slow. Doing both of those in one minute is pretty good.

As Kernighan and Paugher said in 1978, "Don’t diddle code to make it faster—find a better algorithm.”

https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <[hidden email]> wrote:
>
> Hi Vignan,
>
> We need to see more details / code of what your query parser plugin does
> exactly with term vectors, we can't really help you without more details.
> Is it open source? Can you share a minimal example that recreates the
> problem?
>
> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]> wrote:
>
>> Hi guys,
>>
>> I made my custom qparser plugin in Solr for scoring. The plugin only does
>> cosine similarity of vectors for each record. I use term vectors here.
>> Results are fine!
>>
>> BUT, Solr response is very slow with term vectors. It takes around 55
>> seconds for each request for 1000000 records.
>> How do I make it faster to get my results in ms ?
>> Please respond soon as its lil urgent.
>>
>> Note: All my values are stored and indexed. I am not using Solr Cloud.
>>
>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Vignan Malyala
Hi Doug / Walter,

I'm just using this methodology.
PFB link of my sample code.
https://github.com/saaay71/solr-vector-scoring

The only issue is speed of response for 1M records.

On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <[hidden email]>
wrote:

> tf.idf was invented because cosine similarity is too much computation.
> tf.idf gives similar results much, much faster than cosine distance.
>
> I would expect cosine similarity to be slow. I would also expect
> retrieving 1 million records to be slow. Doing both of those in one minute
> is pretty good.
>
> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> faster—find a better algorithm.”
>
> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
> > On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> [hidden email]> wrote:
> >
> > Hi Vignan,
> >
> > We need to see more details / code of what your query parser plugin does
> > exactly with term vectors, we can't really help you without more details.
> > Is it open source? Can you share a minimal example that recreates the
> > problem?
> >
> > On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]>
> wrote:
> >
> >> Hi guys,
> >>
> >> I made my custom qparser plugin in Solr for scoring. The plugin only
> does
> >> cosine similarity of vectors for each record. I use term vectors here.
> >> Results are fine!
> >>
> >> BUT, Solr response is very slow with term vectors. It takes around 55
> >> seconds for each request for 1000000 records.
> >> How do I make it faster to get my results in ms ?
> >> Please respond soon as its lil urgent.
> >>
> >> Note: All my values are stored and indexed. I am not using Solr Cloud.
> >>
> >
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > <http://opensourceconnections.com>, LLC | 240.476.9983
> > Author: Relevant Search <http://manning.com/turnbull>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Vignan Malyala
Hi
Any solution for this? Taking around 50 seconds to get response.

On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]> wrote:

> Hi Doug / Walter,
>
> I'm just using this methodology.
> PFB link of my sample code.
> https://github.com/saaay71/solr-vector-scoring
>
> The only issue is speed of response for 1M records.
>
> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <[hidden email]>
> wrote:
>
>> tf.idf was invented because cosine similarity is too much computation.
>> tf.idf gives similar results much, much faster than cosine distance.
>>
>> I would expect cosine similarity to be slow. I would also expect
>> retrieving 1 million records to be slow. Doing both of those in one minute
>> is pretty good.
>>
>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>> faster—find a better algorithm.”
>>
>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>> [hidden email]> wrote:
>> >
>> > Hi Vignan,
>> >
>> > We need to see more details / code of what your query parser plugin does
>> > exactly with term vectors, we can't really help you without more
>> details.
>> > Is it open source? Can you share a minimal example that recreates the
>> > problem?
>> >
>> > On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]>
>> wrote:
>> >
>> >> Hi guys,
>> >>
>> >> I made my custom qparser plugin in Solr for scoring. The plugin only
>> does
>> >> cosine similarity of vectors for each record. I use term vectors here.
>> >> Results are fine!
>> >>
>> >> BUT, Solr response is very slow with term vectors. It takes around 55
>> >> seconds for each request for 1000000 records.
>> >> How do I make it faster to get my results in ms ?
>> >> Please respond soon as its lil urgent.
>> >>
>> >> Note: All my values are stored and indexed. I am not using Solr Cloud.
>> >>
>> >
>> >
>> > --
>> > *Doug Turnbull **| CTO* | OpenSource Connections
>> > <http://opensourceconnections.com>, LLC | 240.476.9983
>> > Author: Relevant Search <http://manning.com/turnbull>
>> > This e-mail and all contents, including attachments, is considered to be
>> > Company Confidential unless explicitly stated otherwise, regardless
>> > of whether attachments are marked as such.
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Jörn Franke
How much response time do you require?
I think you have to solve the issue in your code by introducing higher parallelism during calculation and potentially more cores.

Maybe you can also precalculate what you do, cache it and use during request the precalculated values.

> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
>
> Hi
> Any solution for this? Taking around 50 seconds to get response.
>
>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]> wrote:
>>
>> Hi Doug / Walter,
>>
>> I'm just using this methodology.
>> PFB link of my sample code.
>> https://github.com/saaay71/solr-vector-scoring
>>
>> The only issue is speed of response for 1M records.
>>
>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <[hidden email]>
>> wrote:
>>
>>> tf.idf was invented because cosine similarity is too much computation.
>>> tf.idf gives similar results much, much faster than cosine distance.
>>>
>>> I would expect cosine similarity to be slow. I would also expect
>>> retrieving 1 million records to be slow. Doing both of those in one minute
>>> is pretty good.
>>>
>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>> faster—find a better algorithm.”
>>>
>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>
>>> wunder
>>> Walter Underwood
>>> [hidden email]
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>> [hidden email]> wrote:
>>>>
>>>> Hi Vignan,
>>>>
>>>> We need to see more details / code of what your query parser plugin does
>>>> exactly with term vectors, we can't really help you without more
>>> details.
>>>> Is it open source? Can you share a minimal example that recreates the
>>>> problem?
>>>>
>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]>
>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I made my custom qparser plugin in Solr for scoring. The plugin only
>>> does
>>>>> cosine similarity of vectors for each record. I use term vectors here.
>>>>> Results are fine!
>>>>>
>>>>> BUT, Solr response is very slow with term vectors. It takes around 55
>>>>> seconds for each request for 1000000 records.
>>>>> How do I make it faster to get my results in ms ?
>>>>> Please respond soon as its lil urgent.
>>>>>
>>>>> Note: All my values are stored and indexed. I am not using Solr Cloud.
>>>>>
>>>>
>>>>
>>>> --
>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>> This e-mail and all contents, including attachments, is considered to be
>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>> of whether attachments are marked as such.
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Vignan Malyala
I want response time below 3 seconds.
And fyi I'm already using 32 cores.
My cache is already full too and obviously same requests don't occur in my
case.


On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]> wrote:

> How much response time do you require?
> I think you have to solve the issue in your code by introducing higher
> parallelism during calculation and potentially more cores.
>
> Maybe you can also precalculate what you do, cache it and use during
> request the precalculated values.
>
> > Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
> >
> > Hi
> > Any solution for this? Taking around 50 seconds to get response.
> >
> >> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
> wrote:
> >>
> >> Hi Doug / Walter,
> >>
> >> I'm just using this methodology.
> >> PFB link of my sample code.
> >> https://github.com/saaay71/solr-vector-scoring
> >>
> >> The only issue is speed of response for 1M records.
> >>
> >> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
> [hidden email]>
> >> wrote:
> >>
> >>> tf.idf was invented because cosine similarity is too much computation.
> >>> tf.idf gives similar results much, much faster than cosine distance.
> >>>
> >>> I would expect cosine similarity to be slow. I would also expect
> >>> retrieving 1 million records to be slow. Doing both of those in one
> minute
> >>> is pretty good.
> >>>
> >>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> >>> faster—find a better algorithm.”
> >>>
> >>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> [hidden email]
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> >>> [hidden email]> wrote:
> >>>>
> >>>> Hi Vignan,
> >>>>
> >>>> We need to see more details / code of what your query parser plugin
> does
> >>>> exactly with term vectors, we can't really help you without more
> >>> details.
> >>>> Is it open source? Can you share a minimal example that recreates the
> >>>> problem?
> >>>>
> >>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]>
> >>> wrote:
> >>>>
> >>>>> Hi guys,
> >>>>>
> >>>>> I made my custom qparser plugin in Solr for scoring. The plugin only
> >>> does
> >>>>> cosine similarity of vectors for each record. I use term vectors
> here.
> >>>>> Results are fine!
> >>>>>
> >>>>> BUT, Solr response is very slow with term vectors. It takes around 55
> >>>>> seconds for each request for 1000000 records.
> >>>>> How do I make it faster to get my results in ms ?
> >>>>> Please respond soon as its lil urgent.
> >>>>>
> >>>>> Note: All my values are stored and indexed. I am not using Solr
> Cloud.
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Doug Turnbull **| CTO* | OpenSource Connections
> >>>> <http://opensourceconnections.com>, LLC | 240.476.9983
> >>>> Author: Relevant Search <http://manning.com/turnbull>
> >>>> This e-mail and all contents, including attachments, is considered to
> be
> >>>> Company Confidential unless explicitly stated otherwise, regardless
> >>>> of whether attachments are marked as such.
> >>>
> >>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Jörn Franke
Is your custom query parser multithreaded and leverages all cores?

> Am 16.08.2019 um 13:12 schrieb Vignan Malyala <[hidden email]>:
>
> I want response time below 3 seconds.
> And fyi I'm already using 32 cores.
> My cache is already full too and obviously same requests don't occur in my
> case.
>
>
>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]> wrote:
>>
>> How much response time do you require?
>> I think you have to solve the issue in your code by introducing higher
>> parallelism during calculation and potentially more cores.
>>
>> Maybe you can also precalculate what you do, cache it and use during
>> request the precalculated values.
>>
>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
>>>
>>> Hi
>>> Any solution for this? Taking around 50 seconds to get response.
>>>
>>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
>> wrote:
>>>>
>>>> Hi Doug / Walter,
>>>>
>>>> I'm just using this methodology.
>>>> PFB link of my sample code.
>>>> https://github.com/saaay71/solr-vector-scoring
>>>>
>>>> The only issue is speed of response for 1M records.
>>>>
>>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>> [hidden email]>
>>>> wrote:
>>>>
>>>>> tf.idf was invented because cosine similarity is too much computation.
>>>>> tf.idf gives similar results much, much faster than cosine distance.
>>>>>
>>>>> I would expect cosine similarity to be slow. I would also expect
>>>>> retrieving 1 million records to be slow. Doing both of those in one
>> minute
>>>>> is pretty good.
>>>>>
>>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>>>> faster—find a better algorithm.”
>>>>>
>>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>>>
>>>>> wunder
>>>>> Walter Underwood
>>>>> [hidden email]
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>
>>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> Hi Vignan,
>>>>>>
>>>>>> We need to see more details / code of what your query parser plugin
>> does
>>>>>> exactly with term vectors, we can't really help you without more
>>>>> details.
>>>>>> Is it open source? Can you share a minimal example that recreates the
>>>>>> problem?
>>>>>>
>>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin only
>>>>> does
>>>>>>> cosine similarity of vectors for each record. I use term vectors
>> here.
>>>>>>> Results are fine!
>>>>>>>
>>>>>>> BUT, Solr response is very slow with term vectors. It takes around 55
>>>>>>> seconds for each request for 1000000 records.
>>>>>>> How do I make it faster to get my results in ms ?
>>>>>>> Please respond soon as its lil urgent.
>>>>>>>
>>>>>>> Note: All my values are stored and indexed. I am not using Solr
>> Cloud.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>>> This e-mail and all contents, including attachments, is considered to
>> be
>>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>>> of whether attachments are marked as such.
>>>>>
>>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Vignan Malyala
How do I check that in solr? Can anyone share link on implementation of
threads in solr?

On Fri 16 Aug, 2019, 4:52 PM Jörn Franke, <[hidden email]> wrote:

> Is your custom query parser multithreaded and leverages all cores?
>
> > Am 16.08.2019 um 13:12 schrieb Vignan Malyala <[hidden email]>:
> >
> > I want response time below 3 seconds.
> > And fyi I'm already using 32 cores.
> > My cache is already full too and obviously same requests don't occur in
> my
> > case.
> >
> >
> >> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]>
> wrote:
> >>
> >> How much response time do you require?
> >> I think you have to solve the issue in your code by introducing higher
> >> parallelism during calculation and potentially more cores.
> >>
> >> Maybe you can also precalculate what you do, cache it and use during
> >> request the precalculated values.
> >>
> >>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
> >>>
> >>> Hi
> >>> Any solution for this? Taking around 50 seconds to get response.
> >>>
> >>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
> >> wrote:
> >>>>
> >>>> Hi Doug / Walter,
> >>>>
> >>>> I'm just using this methodology.
> >>>> PFB link of my sample code.
> >>>> https://github.com/saaay71/solr-vector-scoring
> >>>>
> >>>> The only issue is speed of response for 1M records.
> >>>>
> >>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
> >> [hidden email]>
> >>>> wrote:
> >>>>
> >>>>> tf.idf was invented because cosine similarity is too much
> computation.
> >>>>> tf.idf gives similar results much, much faster than cosine distance.
> >>>>>
> >>>>> I would expect cosine similarity to be slow. I would also expect
> >>>>> retrieving 1 million records to be slow. Doing both of those in one
> >> minute
> >>>>> is pretty good.
> >>>>>
> >>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> >>>>> faster—find a better algorithm.”
> >>>>>
> >>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
> >>>>>
> >>>>> wunder
> >>>>> Walter Underwood
> >>>>> [hidden email]
> >>>>> http://observer.wunderwood.org/  (my blog)
> >>>>>
> >>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> >>>>> [hidden email]> wrote:
> >>>>>>
> >>>>>> Hi Vignan,
> >>>>>>
> >>>>>> We need to see more details / code of what your query parser plugin
> >> does
> >>>>>> exactly with term vectors, we can't really help you without more
> >>>>> details.
> >>>>>> Is it open source? Can you share a minimal example that recreates
> the
> >>>>>> problem?
> >>>>>>
> >>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
> [hidden email]>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hi guys,
> >>>>>>>
> >>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin
> only
> >>>>> does
> >>>>>>> cosine similarity of vectors for each record. I use term vectors
> >> here.
> >>>>>>> Results are fine!
> >>>>>>>
> >>>>>>> BUT, Solr response is very slow with term vectors. It takes around
> 55
> >>>>>>> seconds for each request for 1000000 records.
> >>>>>>> How do I make it faster to get my results in ms ?
> >>>>>>> Please respond soon as its lil urgent.
> >>>>>>>
> >>>>>>> Note: All my values are stored and indexed. I am not using Solr
> >> Cloud.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
> >>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
> >>>>>> Author: Relevant Search <http://manning.com/turnbull>
> >>>>>> This e-mail and all contents, including attachments, is considered
> to
> >> be
> >>>>>> Company Confidential unless explicitly stated otherwise, regardless
> >>>>>> of whether attachments are marked as such.
> >>>>>
> >>>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Jörn Franke
You would have to implement that I don’t think that Solr is threading the query parser magically for you, but maybe some people have more insight on this topic.

> Am 16.08.2019 um 15:42 schrieb Vignan Malyala <[hidden email]>:
>
> How do I check that in solr? Can anyone share link on implementation of
> threads in solr?
>
>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke, <[hidden email]> wrote:
>>
>> Is your custom query parser multithreaded and leverages all cores?
>>
>>> Am 16.08.2019 um 13:12 schrieb Vignan Malyala <[hidden email]>:
>>>
>>> I want response time below 3 seconds.
>>> And fyi I'm already using 32 cores.
>>> My cache is already full too and obviously same requests don't occur in
>> my
>>> case.
>>>
>>>
>>>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]>
>> wrote:
>>>>
>>>> How much response time do you require?
>>>> I think you have to solve the issue in your code by introducing higher
>>>> parallelism during calculation and potentially more cores.
>>>>
>>>> Maybe you can also precalculate what you do, cache it and use during
>>>> request the precalculated values.
>>>>
>>>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
>>>>>
>>>>> Hi
>>>>> Any solution for this? Taking around 50 seconds to get response.
>>>>>
>>>>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
>>>> wrote:
>>>>>>
>>>>>> Hi Doug / Walter,
>>>>>>
>>>>>> I'm just using this methodology.
>>>>>> PFB link of my sample code.
>>>>>> https://github.com/saaay71/solr-vector-scoring
>>>>>>
>>>>>> The only issue is speed of response for 1M records.
>>>>>>
>>>>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>>>> [hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> tf.idf was invented because cosine similarity is too much
>> computation.
>>>>>>> tf.idf gives similar results much, much faster than cosine distance.
>>>>>>>
>>>>>>> I would expect cosine similarity to be slow. I would also expect
>>>>>>> retrieving 1 million records to be slow. Doing both of those in one
>>>> minute
>>>>>>> is pretty good.
>>>>>>>
>>>>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>>>>>> faster—find a better algorithm.”
>>>>>>>
>>>>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>>>>>
>>>>>>> wunder
>>>>>>> Walter Underwood
>>>>>>> [hidden email]
>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>>
>>>>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Vignan,
>>>>>>>>
>>>>>>>> We need to see more details / code of what your query parser plugin
>>>> does
>>>>>>>> exactly with term vectors, we can't really help you without more
>>>>>>> details.
>>>>>>>> Is it open source? Can you share a minimal example that recreates
>> the
>>>>>>>> problem?
>>>>>>>>
>>>>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>> [hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin
>> only
>>>>>>> does
>>>>>>>>> cosine similarity of vectors for each record. I use term vectors
>>>> here.
>>>>>>>>> Results are fine!
>>>>>>>>>
>>>>>>>>> BUT, Solr response is very slow with term vectors. It takes around
>> 55
>>>>>>>>> seconds for each request for 1000000 records.
>>>>>>>>> How do I make it faster to get my results in ms ?
>>>>>>>>> Please respond soon as its lil urgent.
>>>>>>>>>
>>>>>>>>> Note: All my values are stored and indexed. I am not using Solr
>>>> Cloud.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>>>>> This e-mail and all contents, including attachments, is considered
>> to
>>>> be
>>>>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>>>>> of whether attachments are marked as such.
>>>>>>>
>>>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Jan Høydahl / Cominvent
I bet your main issue is assuming that this particular plugin is the only way to solve your ranking requirements.
I would advise you to start looking into the various built-in Similarities and instead try to tweak one of those, and/or adding more ranking signals to your solution, perhaps see if ReRanking on top 1000 hits is good enough etc. Not knowing anything about what lead you to that custom bad-performing 3rd party plugin in the first place, it is hard to guess, but take 10 steps back and re-consider that choice.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 16. aug. 2019 kl. 15:50 skrev Jörn Franke <[hidden email]>:
>
> You would have to implement that I don’t think that Solr is threading the query parser magically for you, but maybe some people have more insight on this topic.
>
>> Am 16.08.2019 um 15:42 schrieb Vignan Malyala <[hidden email]>:
>>
>> How do I check that in solr? Can anyone share link on implementation of
>> threads in solr?
>>
>>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke, <[hidden email]> wrote:
>>>
>>> Is your custom query parser multithreaded and leverages all cores?
>>>
>>>> Am 16.08.2019 um 13:12 schrieb Vignan Malyala <[hidden email]>:
>>>>
>>>> I want response time below 3 seconds.
>>>> And fyi I'm already using 32 cores.
>>>> My cache is already full too and obviously same requests don't occur in
>>> my
>>>> case.
>>>>
>>>>
>>>>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]>
>>> wrote:
>>>>>
>>>>> How much response time do you require?
>>>>> I think you have to solve the issue in your code by introducing higher
>>>>> parallelism during calculation and potentially more cores.
>>>>>
>>>>> Maybe you can also precalculate what you do, cache it and use during
>>>>> request the precalculated values.
>>>>>
>>>>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
>>>>>>
>>>>>> Hi
>>>>>> Any solution for this? Taking around 50 seconds to get response.
>>>>>>
>>>>>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
>>>>> wrote:
>>>>>>>
>>>>>>> Hi Doug / Walter,
>>>>>>>
>>>>>>> I'm just using this methodology.
>>>>>>> PFB link of my sample code.
>>>>>>> https://github.com/saaay71/solr-vector-scoring
>>>>>>>
>>>>>>> The only issue is speed of response for 1M records.
>>>>>>>
>>>>>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> tf.idf was invented because cosine similarity is too much
>>> computation.
>>>>>>>> tf.idf gives similar results much, much faster than cosine distance.
>>>>>>>>
>>>>>>>> I would expect cosine similarity to be slow. I would also expect
>>>>>>>> retrieving 1 million records to be slow. Doing both of those in one
>>>>> minute
>>>>>>>> is pretty good.
>>>>>>>>
>>>>>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>>>>>>> faster—find a better algorithm.”
>>>>>>>>
>>>>>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>>>>>>
>>>>>>>> wunder
>>>>>>>> Walter Underwood
>>>>>>>> [hidden email]
>>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>>>
>>>>>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Vignan,
>>>>>>>>>
>>>>>>>>> We need to see more details / code of what your query parser plugin
>>>>> does
>>>>>>>>> exactly with term vectors, we can't really help you without more
>>>>>>>> details.
>>>>>>>>> Is it open source? Can you share a minimal example that recreates
>>> the
>>>>>>>>> problem?
>>>>>>>>>
>>>>>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin
>>> only
>>>>>>>> does
>>>>>>>>>> cosine similarity of vectors for each record. I use term vectors
>>>>> here.
>>>>>>>>>> Results are fine!
>>>>>>>>>>
>>>>>>>>>> BUT, Solr response is very slow with term vectors. It takes around
>>> 55
>>>>>>>>>> seconds for each request for 1000000 records.
>>>>>>>>>> How do I make it faster to get my results in ms ?
>>>>>>>>>> Please respond soon as its lil urgent.
>>>>>>>>>>
>>>>>>>>>> Note: All my values are stored and indexed. I am not using Solr
>>>>> Cloud.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>>>>>> This e-mail and all contents, including attachments, is considered
>>> to
>>>>> be
>>>>>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>>>>>> of whether attachments are marked as such.
>>>>>>>>
>>>>>>>>
>>>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Solr is very slow with term vectors

Walter Underwood
First, time fetching one million records with all the fields you need, both for display and for re-ranking. If that is slow, then no amount of cosine code tweaking will make it fast.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Aug 16, 2019, at 9:23 AM, Jan Høydahl <[hidden email]> wrote:
>
> I bet your main issue is assuming that this particular plugin is the only way to solve your ranking requirements.
> I would advise you to start looking into the various built-in Similarities and instead try to tweak one of those, and/or adding more ranking signals to your solution, perhaps see if ReRanking on top 1000 hits is good enough etc. Not knowing anything about what lead you to that custom bad-performing 3rd party plugin in the first place, it is hard to guess, but take 10 steps back and re-consider that choice.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 16. aug. 2019 kl. 15:50 skrev Jörn Franke <[hidden email]>:
>>
>> You would have to implement that I don’t think that Solr is threading the query parser magically for you, but maybe some people have more insight on this topic.
>>
>>> Am 16.08.2019 um 15:42 schrieb Vignan Malyala <[hidden email]>:
>>>
>>> How do I check that in solr? Can anyone share link on implementation of
>>> threads in solr?
>>>
>>>> On Fri 16 Aug, 2019, 4:52 PM Jörn Franke, <[hidden email]> wrote:
>>>>
>>>> Is your custom query parser multithreaded and leverages all cores?
>>>>
>>>>> Am 16.08.2019 um 13:12 schrieb Vignan Malyala <[hidden email]>:
>>>>>
>>>>> I want response time below 3 seconds.
>>>>> And fyi I'm already using 32 cores.
>>>>> My cache is already full too and obviously same requests don't occur in
>>>> my
>>>>> case.
>>>>>
>>>>>
>>>>>> On Fri 16 Aug, 2019, 11:47 AM Jörn Franke, <[hidden email]>
>>>> wrote:
>>>>>>
>>>>>> How much response time do you require?
>>>>>> I think you have to solve the issue in your code by introducing higher
>>>>>> parallelism during calculation and potentially more cores.
>>>>>>
>>>>>> Maybe you can also precalculate what you do, cache it and use during
>>>>>> request the precalculated values.
>>>>>>
>>>>>>> Am 16.08.2019 um 05:08 schrieb Vignan Malyala <[hidden email]>:
>>>>>>>
>>>>>>> Hi
>>>>>>> Any solution for this? Taking around 50 seconds to get response.
>>>>>>>
>>>>>>>> On Mon 12 Aug, 2019, 3:28 PM Vignan Malyala, <[hidden email]>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Doug / Walter,
>>>>>>>>
>>>>>>>> I'm just using this methodology.
>>>>>>>> PFB link of my sample code.
>>>>>>>> https://github.com/saaay71/solr-vector-scoring
>>>>>>>>
>>>>>>>> The only issue is speed of response for 1M records.
>>>>>>>>
>>>>>>>> On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood <
>>>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> tf.idf was invented because cosine similarity is too much
>>>> computation.
>>>>>>>>> tf.idf gives similar results much, much faster than cosine distance.
>>>>>>>>>
>>>>>>>>> I would expect cosine similarity to be slow. I would also expect
>>>>>>>>> retrieving 1 million records to be slow. Doing both of those in one
>>>>>> minute
>>>>>>>>> is pretty good.
>>>>>>>>>
>>>>>>>>> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
>>>>>>>>> faster—find a better algorithm.”
>>>>>>>>>
>>>>>>>>> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>>>>>>>>>
>>>>>>>>> wunder
>>>>>>>>> Walter Underwood
>>>>>>>>> [hidden email]
>>>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>>>>
>>>>>>>>>> On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Vignan,
>>>>>>>>>>
>>>>>>>>>> We need to see more details / code of what your query parser plugin
>>>>>> does
>>>>>>>>>> exactly with term vectors, we can't really help you without more
>>>>>>>>> details.
>>>>>>>>>> Is it open source? Can you share a minimal example that recreates
>>>> the
>>>>>>>>>> problem?
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala <
>>>> [hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi guys,
>>>>>>>>>>>
>>>>>>>>>>> I made my custom qparser plugin in Solr for scoring. The plugin
>>>> only
>>>>>>>>> does
>>>>>>>>>>> cosine similarity of vectors for each record. I use term vectors
>>>>>> here.
>>>>>>>>>>> Results are fine!
>>>>>>>>>>>
>>>>>>>>>>> BUT, Solr response is very slow with term vectors. It takes around
>>>> 55
>>>>>>>>>>> seconds for each request for 1000000 records.
>>>>>>>>>>> How do I make it faster to get my results in ms ?
>>>>>>>>>>> Please respond soon as its lil urgent.
>>>>>>>>>>>
>>>>>>>>>>> Note: All my values are stored and indexed. I am not using Solr
>>>>>> Cloud.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>>>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>>>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>>>>>>> This e-mail and all contents, including attachments, is considered
>>>> to
>>>>>> be
>>>>>>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>>>>>>> of whether attachments are marked as such.
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>