Build Solr index using Hadoop MapReduce

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Build Solr index using Hadoop MapReduce

Ning Li-3
Hi,

I wonder if there is interest in a contrib module that builds Solr
index using Hadoop MapReduce?

It is different from the Solr support in Nutch. The Solr support in
Nutch sends a document to a Solr server in a reduce task. Here, I aim
at building/updating Solr index within map/reduce tasks. Also, it
achieves better parallelism when the number of map tasks is greater
than the number of reduce tasks, which is usually the case.

I worked out a very simple initial version. But I want to check if
there is any interest before proceeding. If so, I'll open a Jira
issue.

Cheers,
Ning
Reply | Threaded
Open this post in threaded view
|

Re: Build Solr index using Hadoop MapReduce

Shalin Shekhar Mangar
On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <[hidden email]> wrote:

> Hi,
>
> I wonder if there is interest in a contrib module that builds Solr
> index using Hadoop MapReduce?
>

Absolutely!


> It is different from the Solr support in Nutch. The Solr support in
> Nutch sends a document to a Solr server in a reduce task. Here, I aim
> at building/updating Solr index within map/reduce tasks. Also, it
> achieves better parallelism when the number of map tasks is greater
> than the number of reduce tasks, which is usually the case.
>
> I worked out a very simple initial version. But I want to check if
> there is any interest before proceeding. If so, I'll open a Jira
> issue.
>

+1

Please do. It'd be great to see this in Solr.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Build Solr index using Hadoop MapReduce

Marc Sturlese
I am doing some research about creating lucene/solr index using hadoop but there's not so much info around, would be great to see some code!!! (I am experiencing problems specially in duplication detection)
Thanks
Shalin Shekhar Mangar wrote
On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <ning.li.li@gmail.com> wrote:

> Hi,
>
> I wonder if there is interest in a contrib module that builds Solr
> index using Hadoop MapReduce?
>

Absolutely!


> It is different from the Solr support in Nutch. The Solr support in
> Nutch sends a document to a Solr server in a reduce task. Here, I aim
> at building/updating Solr index within map/reduce tasks. Also, it
> achieves better parallelism when the number of map tasks is greater
> than the number of reduce tasks, which is usually the case.
>
> I worked out a very simple initial version. But I want to check if
> there is any interest before proceeding. If so, I'll open a Jira
> issue.
>

+1

Please do. It'd be great to see this in Solr.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Build Solr index using Hadoop MapReduce

Ning Li-3
SOLR-1045 it is. More details will be available in that issue.

Marc, you can check out Hadoop contrib/index which builds a Lucene
index using Hadoop MapReduce. However, it does not handle duplicate
detection.

Cheers,
Ning


On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <[hidden email]> wrote:

>
> I am doing some research about creating lucene/solr index using hadoop but
> there's not so much info around, would be great to see some code!!! (I am
> experiencing problems specially in duplication detection)
> Thanks
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> I wonder if there is interest in a contrib module that builds Solr
>>> index using Hadoop MapReduce?
>>>
>>
>> Absolutely!
>>
>>
>>> It is different from the Solr support in Nutch. The Solr support in
>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>> at building/updating Solr index within map/reduce tasks. Also, it
>>> achieves better parallelism when the number of map tasks is greater
>>> than the number of reduce tasks, which is usually the case.
>>>
>>> I worked out a very simple initial version. But I want to check if
>>> there is any interest before proceeding. If so, I'll open a Jira
>>> issue.
>>>
>>
>> +1
>>
>> Please do. It'd be great to see this in Solr.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Build Solr index using Hadoop MapReduce

JerylCook
 Build Solr index using Hadoop MapReduce
http://issues.apache.org/jira/browse/SOLR-1045

Ning Li-3 wrote
SOLR-1045 it is. More details will be available in that issue.

Marc, you can check out Hadoop contrib/index which builds a Lucene
index using Hadoop MapReduce. However, it does not handle duplicate
detection.

Cheers,
Ning


On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese <marc.sturlese@gmail.com> wrote:
>
> I am doing some research about creating lucene/solr index using hadoop but
> there's not so much info around, would be great to see some code!!! (I am
> experiencing problems specially in duplication detection)
> Thanks
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li <ning.li.li@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I wonder if there is interest in a contrib module that builds Solr
>>> index using Hadoop MapReduce?
>>>
>>
>> Absolutely!
>>
>>
>>> It is different from the Solr support in Nutch. The Solr support in
>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>> at building/updating Solr index within map/reduce tasks. Also, it
>>> achieves better parallelism when the number of map tasks is greater
>>> than the number of reduce tasks, which is usually the case.
>>>
>>> I worked out a very simple initial version. But I want to check if
>>> there is any interest before proceeding. If so, I'll open a Jira
>>> issue.
>>>
>>
>> +1
>>
>> Please do. It'd be great to see this in Solr.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>