Load suggest dictionary from non-Zookeeper file?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Load suggest dictionary from non-Zookeeper file?

Walter Underwood
Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?

java.lang.IllegalArgumentException: Invalid path string "/configs/questions-suggest//solr/suggest-data/questions-suggest/ngram_counts.tsv"

I could bring up a non-cloud cluster just for this suggester, but that seems like an ugly hack.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Load suggest dictionary from non-Zookeeper file?

Shawn Heisey-2
On 5/8/2019 1:59 PM, Walter Underwood wrote:
> Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?

I wouldn't be surprised to learn it's not possible to get it to go
outside zookeeper for config files.  I do not know, though.

For right now, your only option will probably be to increase the
jute.maxbuffer system property on all relevant ZK servers and Solr
servers.  Then you will be able to store data larger than 1MB in ZK.
Somebody from the ZK project would probably frown on that solution, and
if I'm honest, I don't like it much myself.

There are use cases like this where a SolrCloud replica (core) needs to
access some large data that would be better kept on the local disk
instead of in ZK.  I think it's probably a good idea to open an issue
for allowing access to config data on the filesystem for SolrCloud.  So
it's probably a good idea to open an issue to make that possible.  I'd
like some of the other people here to sanity check that idea, though.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Load suggest dictionary from non-Zookeeper file?

Walter Underwood
The file is 33 Megabytes, so I don’t think increasing jute.maxbuffer is a wise idea.

The current documentation is not at all clear about how the dictionary file name is interpreted. I could see an absolute path being local and a relative path being relative to the ZK config folder. I wouldn’t mind using a “file:” URL for local stuff.

None of that is going to get this prototype working today, so I’m back to a non-cloud cluster. That is a real pain in the ass to set up with 6.x and 7.x. I got it working before vacation and now I can’t remember the steps.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On May 8, 2019, at 1:26 PM, Shawn Heisey <[hidden email]> wrote:
>
> On 5/8/2019 1:59 PM, Walter Underwood wrote:
>> Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?
>
> I wouldn't be surprised to learn it's not possible to get it to go outside zookeeper for config files.  I do not know, though.
>
> For right now, your only option will probably be to increase the jute.maxbuffer system property on all relevant ZK servers and Solr servers.  Then you will be able to store data larger than 1MB in ZK. Somebody from the ZK project would probably frown on that solution, and if I'm honest, I don't like it much myself.
>
> There are use cases like this where a SolrCloud replica (core) needs to access some large data that would be better kept on the local disk instead of in ZK.  I think it's probably a good idea to open an issue for allowing access to config data on the filesystem for SolrCloud.  So it's probably a good idea to open an issue to make that possible.  I'd like some of the other people here to sanity check that idea, though.
>
> Thanks,
> Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Load suggest dictionary from non-Zookeeper file?

Mikhail Khludnev-2
In reply to this post by Shawn Heisey-2
It reminds me  https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
I don't think it's already integrated with suggester.

On Wed, May 8, 2019 at 11:26 PM Shawn Heisey <[hidden email]> wrote:

> On 5/8/2019 1:59 PM, Walter Underwood wrote:
> > Our suggest dictionary is too big for Zookeeper. I’m trying to load it
> from an absolute path, but the Solr 6.6.1 insists on interpreting that as a
> Zookeeper path. Any way to disable that?
>
> I wouldn't be surprised to learn it's not possible to get it to go
> outside zookeeper for config files.  I do not know, though.
>
> For right now, your only option will probably be to increase the
> jute.maxbuffer system property on all relevant ZK servers and Solr
> servers.  Then you will be able to store data larger than 1MB in ZK.
> Somebody from the ZK project would probably frown on that solution, and
> if I'm honest, I don't like it much myself.
>
> There are use cases like this where a SolrCloud replica (core) needs to
> access some large data that would be better kept on the local disk
> instead of in ZK.  I think it's probably a good idea to open an issue
> for allowing access to config data on the filesystem for SolrCloud.  So
> it's probably a good idea to open an issue to make that possible.  I'd
> like some of the other people here to sanity check that idea, though.
>
> Thanks,
> Shawn
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: Load suggest dictionary from non-Zookeeper file?

Shawn Heisey-2
On 5/8/2019 2:34 PM, Mikhail Khludnev wrote:
> It reminds me  https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
> I don't think it's already integrated with suggester.

I'm having one of of those days where I can't seem to recall things easily.

With the blob store, the blobs are in the Lucene index, right?

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Load suggest dictionary from non-Zookeeper file?

Mikhail Khludnev-2
Right.

On Wed, May 8, 2019 at 11:49 PM Shawn Heisey <[hidden email]> wrote:

> On 5/8/2019 2:34 PM, Mikhail Khludnev wrote:
> > It reminds me
> https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
> > I don't think it's already integrated with suggester.
>
> I'm having one of of those days where I can't seem to recall things easily.
>
> With the blob store, the blobs are in the Lucene index, right?
>
> Thanks,
> Shawn
>


--
Sincerely yours
Mikhail Khludnev