Synonym filters memory usage

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Synonym filters memory usage

Dominique Bejean
Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique
Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Andrea Gazzarini-6
 Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

--
Andrea Gazzarini
*Search Consultant, R&D Software Engineer*


www.sease.io

email: [hidden email]
cell: +39 349 513 86 25

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique
Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Bernd Fehling
In reply to this post by Dominique Bejean
And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:

>   Hi,
> looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
> the answer should be 2 times (one time per type instance).
> The SynonymMap, which internally holds the synonyms table, is a private
> member of the filter factory and it is loaded each time the factory needs
> to create a type.
>
> Best,
> Andrea
>
> On 29/09/2019 23:49, Dominique Bejean wrote:
>
> Hi,
>
> My concern is about memory used by synonym filter, especially if synonyms
> resources files are large.
>
> If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
> using synonym filter with the same synonyms files.
> For each of these two field types there are two fields
>
> Field1 type is TypeSyno1
> Field2 type is TypeSyno1
> Field3 type is TypeSyno2
> Field4 type is TypeSyno2
>
> How many times is the synonym file loaded in memory ?
> 4 times, so one time per field ?
> 2 times, so one time per instanciated type ?
>
> Regards
>
> Dominique
Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Andrea Gazzarini-6


On 30/09/2019 09:04, Bernd Fehling wrote:
And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique

--
Andrea Gazzarini
Search Consultant, R&D Software Engineer



mobile: +39 349 513 86 25
email: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Andrea Gazzarini-6
In reply to this post by Bernd Fehling
mmm, ok for the core but are you sure things in this case are working per-segment? I would expect a FilterFactory instance per index, initialized at schema loading time.

On 30/09/2019 09:04, Bernd Fehling wrote:
And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique

--
Andrea Gazzarini
Search Consultant, R&D Software Engineer



mobile: +39 349 513 86 25
email: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Bernd Fehling
In reply to this post by Bernd Fehling
Yes, I think so.
While integrating a Thesaurus as synonyms.txt I saw massive memory usage.
A heap dump and analysis with MemoryAnalyzer pointed out that the
SynonymMap took 3 times a huge amount of memory, together with each
opened index segment.
Just try it and check that by yourself with heap dump and MemoryAnalyzer.

Regards
Bernd


Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:

> mmm, ok for the core but are you sure things in this case are working per-segment? I would expect a FilterFactory instance per index,
> initialized at schema loading time.
>
> On 30/09/2019 09:04, Bernd Fehling wrote:
>> And I think this is per core per index segment.
>>
>> 2 cores per instance, each core with 3 index segments, sums up to 6 times
>> the 2 SynonymMaps. Results in 12 times SynonymMaps.
>>
>> Regards
>> Bernd
>>
>>
>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
>>>   Hi,
>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
>>> the answer should be 2 times (one time per type instance).
>>> The SynonymMap, which internally holds the synonyms table, is a private
>>> member of the filter factory and it is loaded each time the factory needs
>>> to create a type.
>>>
>>> Best,
>>> Andrea
>>>
>>> On 29/09/2019 23:49, Dominique Bejean wrote:
>>>
>>> Hi,
>>>
>>> My concern is about memory used by synonym filter, especially if synonyms
>>> resources files are large.
>>>
>>> If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
>>> using synonym filter with the same synonyms files.
>>> For each of these two field types there are two fields
>>>
>>> Field1 type is TypeSyno1
>>> Field2 type is TypeSyno1
>>> Field3 type is TypeSyno2
>>> Field4 type is TypeSyno2
>>>
>>> How many times is the synonym file loaded in memory ?
>>> 4 times, so one time per field ?
>>> 2 times, so one time per instanciated type ?
>>>
>>> Regards
>>>
>>> Dominique
>
Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Andrea Gazzarini-6
That sounds really strange to me.
Segments are created gradually depending on changes applied to the index, while the Schema should have a completely different lifecycle
, independent from that.
If that is true, that would mean each time a new segment is created Solr would instantiate a new Schema instance (or at least, assuming this is valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, one SynonymMap), which again, sounds really strange.

Thanks for the point, I'll check and I'll let you know

Cheers,
Andrea

On 30/09/2019 09:58, Bernd Fehling wrote:
Yes, I think so.
While integrating a Thesaurus as synonyms.txt I saw massive memory usage.
A heap dump and analysis with MemoryAnalyzer pointed out that the
SynonymMap took 3 times a huge amount of memory, together with each
opened index segment.
Just try it and check that by yourself with heap dump and MemoryAnalyzer.

Regards
Bernd


Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
mmm, ok for the core but are you sure things in this case are working per-segment? I would expect a FilterFactory instance per index, initialized at schema loading time.

On 30/09/2019 09:04, Bernd Fehling wrote:
And I think this is per core per index segment.

2 cores per instance, each core with 3 index segments, sums up to 6 times
the 2 SynonymMaps. Results in 12 times SynonymMaps.

Regards
Bernd


Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
  Hi,
looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
the answer should be 2 times (one time per type instance).
The SynonymMap, which internally holds the synonyms table, is a private
member of the filter factory and it is loaded each time the factory needs
to create a type.

Best,
Andrea

On 29/09/2019 23:49, Dominique Bejean wrote:

Hi,

My concern is about memory used by synonym filter, especially if synonyms
resources files are large.

If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
using synonym filter with the same synonyms files.
For each of these two field types there are two fields

Field1 type is TypeSyno1
Field2 type is TypeSyno1
Field3 type is TypeSyno2
Field4 type is TypeSyno2

How many times is the synonym file loaded in memory ?
4 times, so one time per field ?
2 times, so one time per instanciated type ?

Regards

Dominique


--
Andrea Gazzarini
Search Consultant, R&D Software Engineer



mobile: +39 349 513 86 25
email: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Erick Erickson
Solr/Lucene _better_ not have a copy of the synonym map for every segment, if so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a large synonym file it’d be terrible.

I would be really, really, really surprised if this is the case. The Lucene people are very careful with memory usage and would hop on this in an instant if true I’d guess.

Best,
Erick

> On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini <[hidden email]> wrote:
>
> That sounds really strange to me.
> Segments are created gradually depending on changes applied to the index, while the Schema should have a completely different lifecycle, independent from that.
> If that is true, that would mean each time a new segment is created Solr would instantiate a new Schema instance (or at least, assuming this is valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, one SynonymMap), which again, sounds really strange.
>
> Thanks for the point, I'll check and I'll let you know
>
> Cheers,
> Andrea
>
> On 30/09/2019 09:58, Bernd Fehling wrote:
>> Yes, I think so.
>> While integrating a Thesaurus as synonyms.txt I saw massive memory usage.
>> A heap dump and analysis with MemoryAnalyzer pointed out that the
>> SynonymMap took 3 times a huge amount of memory, together with each
>> opened index segment.
>> Just try it and check that by yourself with heap dump and MemoryAnalyzer.
>>
>> Regards
>> Bernd
>>
>>
>> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
>>> mmm, ok for the core but are you sure things in this case are working per-segment? I would expect a FilterFactory instance per index, initialized at schema loading time.
>>>
>>> On 30/09/2019 09:04, Bernd Fehling wrote:
>>>> And I think this is per core per index segment.
>>>>
>>>> 2 cores per instance, each core with 3 index segments, sums up to 6 times
>>>> the 2 SynonymMaps. Results in 12 times SynonymMaps.
>>>>
>>>> Regards
>>>> Bernd
>>>>
>>>>
>>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
>>>>>   Hi,
>>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory classes,
>>>>> the answer should be 2 times (one time per type instance).
>>>>> The SynonymMap, which internally holds the synonyms table, is a private
>>>>> member of the filter factory and it is loaded each time the factory needs
>>>>> to create a type.
>>>>>
>>>>> Best,
>>>>> Andrea
>>>>>
>>>>> On 29/09/2019 23:49, Dominique Bejean wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> My concern is about memory used by synonym filter, especially if synonyms
>>>>> resources files are large.
>>>>>
>>>>> If in my schema, there are two field types "TypeSyno1" and "TypeSyno2"
>>>>> using synonym filter with the same synonyms files.
>>>>> For each of these two field types there are two fields
>>>>>
>>>>> Field1 type is TypeSyno1
>>>>> Field2 type is TypeSyno1
>>>>> Field3 type is TypeSyno2
>>>>> Field4 type is TypeSyno2
>>>>>
>>>>> How many times is the synonym file loaded in memory ?
>>>>> 4 times, so one time per field ?
>>>>> 2 times, so one time per instanciated type ?
>>>>>
>>>>> Regards
>>>>>
>>>>> Dominique
>>>
>
> --
> Andrea Gazzarini
> Search Consultant, R&D Software Engineer
>
>
>
> mobile: +39 349 513 86 25
> email: [hidden email]
>

Reply | Threaded
Open this post in threaded view
|

Re: Synonym filters memory usage

Dominique Bejean
Thank you for all your responses.
Dominique

Le lun. 30 sept. 2019 à 13:38, Erick Erickson <[hidden email]> a
écrit :

> Solr/Lucene _better_ not have a copy of the synonym map for every segment,
> if so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a
> large synonym file it’d be terrible.
>
> I would be really, really, really surprised if this is the case. The
> Lucene people are very careful with memory usage and would hop on this in
> an instant if true I’d guess.
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini <[hidden email]>
> wrote:
> >
> > That sounds really strange to me.
> > Segments are created gradually depending on changes applied to the
> index, while the Schema should have a completely different lifecycle,
> independent from that.
> > If that is true, that would mean each time a new segment is created Solr
> would instantiate a new Schema instance (or at least, assuming this is
> valid only for synonyms, one SynonymFilterFactory, one SynonymFilter, one
> SynonymMap), which again, sounds really strange.
> >
> > Thanks for the point, I'll check and I'll let you know
> >
> > Cheers,
> > Andrea
> >
> > On 30/09/2019 09:58, Bernd Fehling wrote:
> >> Yes, I think so.
> >> While integrating a Thesaurus as synonyms.txt I saw massive memory
> usage.
> >> A heap dump and analysis with MemoryAnalyzer pointed out that the
> >> SynonymMap took 3 times a huge amount of memory, together with each
> >> opened index segment.
> >> Just try it and check that by yourself with heap dump and
> MemoryAnalyzer.
> >>
> >> Regards
> >> Bernd
> >>
> >>
> >> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini:
> >>> mmm, ok for the core but are you sure things in this case are working
> per-segment? I would expect a FilterFactory instance per index, initialized
> at schema loading time.
> >>>
> >>> On 30/09/2019 09:04, Bernd Fehling wrote:
> >>>> And I think this is per core per index segment.
> >>>>
> >>>> 2 cores per instance, each core with 3 index segments, sums up to 6
> times
> >>>> the 2 SynonymMaps. Results in 12 times SynonymMaps.
> >>>>
> >>>> Regards
> >>>> Bernd
> >>>>
> >>>>
> >>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini:
> >>>>>   Hi,
> >>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory
> classes,
> >>>>> the answer should be 2 times (one time per type instance).
> >>>>> The SynonymMap, which internally holds the synonyms table, is a
> private
> >>>>> member of the filter factory and it is loaded each time the factory
> needs
> >>>>> to create a type.
> >>>>>
> >>>>> Best,
> >>>>> Andrea
> >>>>>
> >>>>> On 29/09/2019 23:49, Dominique Bejean wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> My concern is about memory used by synonym filter, especially if
> synonyms
> >>>>> resources files are large.
> >>>>>
> >>>>> If in my schema, there are two field types "TypeSyno1" and
> "TypeSyno2"
> >>>>> using synonym filter with the same synonyms files.
> >>>>> For each of these two field types there are two fields
> >>>>>
> >>>>> Field1 type is TypeSyno1
> >>>>> Field2 type is TypeSyno1
> >>>>> Field3 type is TypeSyno2
> >>>>> Field4 type is TypeSyno2
> >>>>>
> >>>>> How many times is the synonym file loaded in memory ?
> >>>>> 4 times, so one time per field ?
> >>>>> 2 times, so one time per instanciated type ?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Dominique
> >>>
> >
> > --
> > Andrea Gazzarini
> > Search Consultant, R&D Software Engineer
> >
> >
> >
> > mobile: +39 349 513 86 25
> > email: [hidden email]
> >
>
>