[lucy-dev] Custom Backend for Index : Equivalent of DirectoryFactory.java (Lucene)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-dev] Custom Backend for Index : Equivalent of DirectoryFactory.java (Lucene)

bhardwajrajesh1973@gmail.com
Hello,
I want to save the index files in rocksdb. Idea is to use another opensource project - https://github.com/pinterest/rocksplicator (or other WAL reading mechanism) by which I can replicate the index data to slave systems.

What will be equivalnet of DirectoryFactory.java in Apache lucy.  I see
I see FSDirHanlde.c and its corresponding files (FSFileHandle.c,FSFolder,FileHandle.c,Folder.c)
and RamDirHandle.c and its corresponding files.

Can some one please provides pointers if my approch is correct . Will it cause issues in merging process.
Any input is welcome

regards
Rajesh

 
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Custom Backend for Index : Equivalent of DirectoryFactory.java (Lucene)

Nick Wellnhofer
On 15/03/2018 00:14, [hidden email] wrote:
> I want to save the index files in rocksdb. Idea is to use another opensource project - https://github.com/pinterest/rocksplicator (or other WAL reading mechanism) by which I can replicate the index data to slave systems.

I'm not sure whether such an approach makes sense. Lucy is basically a
specialized database itself, so your question is similar to asking how you
could replicate a MySQL database using a key-value store.

> What will be equivalnet of DirectoryFactory.java in Apache lucy.  I see
> I see FSDirHanlde.c and its corresponding files (FSFileHandle.c,FSFolder,FileHandle.c,Folder.c)
> and RamDirHandle.c and its corresponding files.
>
> Can some one please provides pointers if my approch is correct . Will it cause issues in merging process.

You can implement your own storage backend by subclassing Folder, FileHandle
and DirHandle. Lucy adheres to a write-once philosophy, so theoretically, it
should be possible to use a key-value DB as backend. But it's probably a lot
easier to move the index files to slave machines in more direct way.

Also, what's your motivation for replicating Lucy indices? Availability?
Performance? Maybe there's a simpler overall approach.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Custom Backend for Index : Equivalent of DirectoryFactory.java (Lucene)

bhardwajrajesh1973@gmail.com


On 2018/03/15 13:11:16, Nick Wellnhofer <[hidden email]> wrote:

> On 15/03/2018 00:14, [hidden email] wrote:
> > I want to save the index files in rocksdb. Idea is to use another opensource project - https://github.com/pinterest/rocksplicator (or other WAL reading mechanism) by which I can replicate the index data to slave systems.
>
> I'm not sure whether such an approach makes sense. Lucy is basically a
> specialized database itself, so your question is similar to asking how you
> could replicate a MySQL database using a key-value store.
>
> > What will be equivalnet of DirectoryFactory.java in Apache lucy.  I see
> > I see FSDirHanlde.c and its corresponding files (FSFileHandle.c,FSFolder,FileHandle.c,Folder.c)
> > and RamDirHandle.c and its corresponding files.
> >
> > Can some one please provides pointers if my approch is correct . Will it cause issues in merging process.
>
> You can implement your own storage backend by subclassing Folder, FileHandle
> and DirHandle. Lucy adheres to a write-once philosophy, so theoretically, it
> should be possible to use a key-value DB as backend. But it's probably a lot
> easier to move the index files to slave machines in more direct way.
>
> Also, what's your motivation for replicating Lucy indices? Availability?
> Performance? Maybe there's a simpler overall approach.
>
> Nick
>

Hello ,
Thanks for the reply. its more about availability.
I went through this link -
http://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html
I was thinking of implementing ; can you suggest what will be best method of implementing above methodlogy
regards
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Custom Backend for Index : Equivalent of DirectoryFactory.java (Lucene)

Nick Wellnhofer
On 15/03/2018 14:51, [hidden email] wrote:
> I went through this link -
> http://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html

Lucy doesn't support any of Lucene's replication features.

> I was thinking of implementing ; can you suggest what will be best method of implementing above methodlogy

You could start by simply copying the index directory from the master to the
slaves while locking out access to the index on both master and slaves. Lucy's
index files never change, so you can use something equivalent to `rsync
--ignore-existing`.

Here's an overview of the directory layout:

     http://lucy.apache.org/docs/c/Lucy/Docs/FileFormat.html

Ignoring any lock files, the list of files is:

- snapshot_*.json
- schema_*.json
- seg_*/segmeta.json
- seg_*/cfmeta.json
- seg_*/cf.dat

If you want to support concurrent searching on the slaves, things get more
complicated. You should:

- Derive the list of segments to be copied from the latest snapshot
   file.
- First copy the new schema and segment files.
- Copy the snapshot file at the end and make sure that it's updated
   atomically.

If there are concurrent updates on the master, it can happen that files are
deleted after reading the snapshot file. So you should make sure that there
are no indexing sessions running during the file transfer or acquire Lucy's
deletion lock.

Afterwards you can delete old segments, either by consulting the file list or
by periodically creating an Indexer on the slaves and immediately destroying it.

Nick