solr cloud - hdfs folder structure best practice

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

solr cloud - hdfs folder structure best practice

lstusr 5u93n4
Hi All,

Here's a question that I can't find an answer to in the documentation:

When configuring solr cloud with HDFS, is it best to:
  a) provide a unique hdfs folder for each solr cloud instance
or
  b) provide the same hdfs folder to all solr cloud instances.

So for example, if I have two solr cloud nodes, I can configure them either
with:

   node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
   node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2

Or I could configure both nodes with:

    -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr

In the second option, all solr cloud nodes can "see" all index files from
all other solr cloud nodes. Are there pros or cons to allowing the all of
the solr nodes to see all files in the collection?

Thanks,

Kyle
Reply | Threaded
Open this post in threaded view
|

Re: solr cloud - hdfs folder structure best practice

Kevin Risden-3
I prefer a single HDFS home since it definitely simplifies things. No need
to create folders for each node or anything like that if you add nodes to
the cluster. The replicas underneath will get their own folders. I don't
know if there are issues with autoAddReplicas or other types of failovers
if there are different home folders.

I've run Solr on HDFS with the same basic configs as listed here:
https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html

Kevin Risden


On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4 <[hidden email]> wrote:

> Hi All,
>
> Here's a question that I can't find an answer to in the documentation:
>
> When configuring solr cloud with HDFS, is it best to:
>   a) provide a unique hdfs folder for each solr cloud instance
> or
>   b) provide the same hdfs folder to all solr cloud instances.
>
> So for example, if I have two solr cloud nodes, I can configure them either
> with:
>
>    node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
>    node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2
>
> Or I could configure both nodes with:
>
>     -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr
>
> In the second option, all solr cloud nodes can "see" all index files from
> all other solr cloud nodes. Are there pros or cons to allowing the all of
> the solr nodes to see all files in the collection?
>
> Thanks,
>
> Kyle
>
Reply | Threaded
Open this post in threaded view
|

Re: solr cloud - hdfs folder structure best practice

lstusr 5u93n4
Great, thanks for the response. This is how we have it configured now, but
we just had the idea the other day that maybe it would be better
otherwise...

And thhanks for the blog post! We ended up with basically the same config,
so it's good to see that validated.

Kyle



On Fri, 2 Nov 2018 at 13:42, Kevin Risden <[hidden email]> wrote:

> I prefer a single HDFS home since it definitely simplifies things. No need
> to create folders for each node or anything like that if you add nodes to
> the cluster. The replicas underneath will get their own folders. I don't
> know if there are issues with autoAddReplicas or other types of failovers
> if there are different home folders.
>
> I've run Solr on HDFS with the same basic configs as listed here:
>
> https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html
>
> Kevin Risden
>
>
> On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4 <[hidden email]> wrote:
>
> > Hi All,
> >
> > Here's a question that I can't find an answer to in the documentation:
> >
> > When configuring solr cloud with HDFS, is it best to:
> >   a) provide a unique hdfs folder for each solr cloud instance
> > or
> >   b) provide the same hdfs folder to all solr cloud instances.
> >
> > So for example, if I have two solr cloud nodes, I can configure them
> either
> > with:
> >
> >    node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
> >    node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2
> >
> > Or I could configure both nodes with:
> >
> >     -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr
> >
> > In the second option, all solr cloud nodes can "see" all index files from
> > all other solr cloud nodes. Are there pros or cons to allowing the all of
> > the solr nodes to see all files in the collection?
> >
> > Thanks,
> >
> > Kyle
> >
>