>1MB file to Zookeeper

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

>1MB file to Zookeeper

Markus Jelsma-2
Hi,

We've increased Zookeepers znode size limit to accomodate for some larger
dictionaries and other files. It isn't the best idea to increase the maximum
znode size. Any plans for splitting up larger files and storing them with
multi? Does anyone have another suggestion?

Thanks,
Markus
Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Mark Miller-3

On May 3, 2012, at 5:15 AM, Markus Jelsma wrote:

> Hi,
>
> We've increased Zookeepers znode size limit to accomodate for some larger
> dictionaries and other files. It isn't the best idea to increase the maximum
> znode size. Any plans for splitting up larger files and storing them with
> multi? Does anyone have another suggestion?
>
> Thanks,
> Markus



Patches welcome :) You can compress, you can break up the files, or you can raise the limit - that's about the options I know of.

You might start by creating a JIRA issue.

- Mark Miller
lucidimagination.com











Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Markus Jelsma-2
Hi.

Compression is a good suggestion. All large dictionaries are compressed well
below 1MB with GZIP. Where should this be implemented? SolrZkClient or
ZkController? Which good compressor is already in Solr's lib? And what's the
difference between SolrZkClient setData and create? Should it autocompress
files larger than N bytes? And how should we detect if data is compressed when
reading from ZooKeeper?

On Thursday 03 May 2012 14:04:31 Mark Miller wrote:

> On May 3, 2012, at 5:15 AM, Markus Jelsma wrote:
> > Hi,
> >
> > We've increased Zookeepers znode size limit to accomodate for some larger
> > dictionaries and other files. It isn't the best idea to increase the
> > maximum znode size. Any plans for splitting up larger files and storing
> > them with multi? Does anyone have another suggestion?
> >
> > Thanks,
> > Markus
>
> Patches welcome :) You can compress, you can break up the files, or you can
> raise the limit - that's about the options I know of.
>
> You might start by creating a JIRA issue.
>
> - Mark Miller
> lucidimagination.com

--
Markus Jelsma - CTO - Openindex
Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Mark Miller-3

On May 3, 2012, at 8:30 AM, Markus Jelsma wrote:

> Hi.
>
> Compression is a good suggestion. All large dictionaries are compressed well
> below 1MB with GZIP. Where should this be implemented? SolrZkClient or
> ZkController?

Hmm...I'm not sure - we want to be careful with this feature. Offhand, I'd guess if we can get it in SolrZkClient that is the right level.

The main issue is that we don't want to compress by default - we want to do it based on size or request - because it's much harder to inspect the data in zk if its compressed. We will probably want to add support to auto uncompress to the Admin Zk view UI.

> Which good compressor is already in Solr's lib?

I don't know that we have one yet - though the benchmark contrib uses a lib for compression (commons-compress from Apache).

> And what's the
> difference between SolrZkClient setData and create?

setData sets data on an existing node - create creates a new node (with or without data).

> Should it autocompress
> files larger than N bytes?

This seems like a reasonable approach to me...

> And how should we detect if data is compressed when
> reading from ZooKeeper?

I was thinking we could somehow use file extensions?

eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc.

We would want to try and make it as transparent as possible though...

>
> On Thursday 03 May 2012 14:04:31 Mark Miller wrote:
>> On May 3, 2012, at 5:15 AM, Markus Jelsma wrote:
>>> Hi,
>>>
>>> We've increased Zookeepers znode size limit to accomodate for some larger
>>> dictionaries and other files. It isn't the best idea to increase the
>>> maximum znode size. Any plans for splitting up larger files and storing
>>> them with multi? Does anyone have another suggestion?
>>>
>>> Thanks,
>>> Markus
>>
>> Patches welcome :) You can compress, you can break up the files, or you can
>> raise the limit - that's about the options I know of.
>>
>> You might start by creating a JIRA issue.
>>
>> - Mark Miller
>> lucidimagination.com
>
> --
> Markus Jelsma - CTO - Openindex

- Mark Miller
lucidimagination.com











Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Yonik Seeley-2-2
On Fri, May 4, 2012 at 12:50 PM, Mark Miller <[hidden email]> wrote:
>> And how should we detect if data is compressed when
>> reading from ZooKeeper?
>
> I was thinking we could somehow use file extensions?
>
> eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc.
>
> We would want to try and make it as transparent as possible though...

At first I thought about adding a marker to the beginning of a file, but
file extensions could work too, as long as the resource loader made it
transparent
(i.e. code would just need to ask for synonyms.txt, but the resource
loader would search
for synonyms.txt.gzip, etc, if the original name was not found)

Hmmm, but this breaks down for things like watches - I guess that's
where putting the encoding inside the file would be a better option.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10
Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Jan Høydahl / Cominvent
ZK is not really designed for keeping large data files, from http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
> ZooKeeper was not designed to be a general database or large object store.....If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.

So perhaps we should think about adding K/V store support to ResourceLoader? If a file is >1Mb, a reference to the file is stored in ZK under
the original resource name, in a way that ResourceLoader can tell that it is a reference, not the complete file. We then make a simple
LargeObjectStoreInterface (with get/put/del) which ResourceLoader uses to get the complete file based on reference. To start with we can make a
ZkLargeFileStoreImpl where the put(key,val) method chops up the file and stores it spanning multiple 1M ZK nodes, and the get(key) method
assembles all parts and returns the object. It would be good enough for most, but if you require something better you can easily impl
support for CouchDb, Voldemort or whatever.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 4. mai 2012, at 19:09, Yonik Seeley wrote:

> On Fri, May 4, 2012 at 12:50 PM, Mark Miller <[hidden email]> wrote:
>>> And how should we detect if data is compressed when
>>> reading from ZooKeeper?
>>
>> I was thinking we could somehow use file extensions?
>>
>> eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc.
>>
>> We would want to try and make it as transparent as possible though...
>
> At first I thought about adding a marker to the beginning of a file, but
> file extensions could work too, as long as the resource loader made it
> transparent
> (i.e. code would just need to ask for synonyms.txt, but the resource
> loader would search
> for synonyms.txt.gzip, etc, if the original name was not found)
>
> Hmmm, but this breaks down for things like watches - I guess that's
> where putting the encoding inside the file would be a better option.
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10

Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Yonik Seeley-2-2
On Sat, May 5, 2012 at 8:39 AM, Jan Høydahl <[hidden email]> wrote:
> support for CouchDb, Voldemort or whatever.

Hmmm... Or Solr!

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: >1MB file to Zookeeper

Mark Miller-3
In reply to this post by Jan Høydahl / Cominvent

On May 5, 2012, at 8:39 AM, Jan Høydahl wrote:

> ZK is not really designed for keeping large data files, fromhttp://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
>> ZooKeeper was not designed to be a general database or large object store.....If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.
>

I don't really think it's that big a deal where Solr is concerned. We don't use ZooKeeper intensively. Only for state changes, and initially loading config for SolrCore starts or reloads. ZooKeeper perf should not be a big deal for Solr. I think the main issue is going to be that zk wants everything in ram - but again, a few, couple MB conf files should be no big deal AFAICT.

Other large scale applications that constantly and intensively use zookeeper are a different story - when Solr is in its running state, it doesnt do anything with zookeeper other than maintain it's heartbeat.

- Mark Miller
lucidimagination.com