[jira] Created: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Sebastian Nagel (Jira)
Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
-----------------------------------------------------------------------------------------------------------------

                 Key: NUTCH-638
                 URL: https://issues.apache.org/jira/browse/NUTCH-638
             Project: Nutch
          Issue Type: Improvement
          Components: searcher
    Affects Versions: 1.0.0
            Reporter: Aaron Nall
            Priority: Minor


I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  

This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Nall updated NUTCH-638:
-----------------------------

    Attachment: distributed-search-uri.patch

This is the patch that I used to address the issue.

> Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-638
>                 URL: https://issues.apache.org/jira/browse/NUTCH-638
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Aaron Nall
>            Priority: Minor
>         Attachments: distributed-search-uri.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  
> This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639058#action_12639058 ]

Andrzej Bialecki  commented on NUTCH-638:
-----------------------------------------

I think in NutchBean.java we can also use dir.getFileSystem(conf) instead of FileSystem.get(dir.toUri(), this.conf). Could you please test if this works for you? Other than that the patch looks fine.

> Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-638
>                 URL: https://issues.apache.org/jira/browse/NUTCH-638
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Aaron Nall
>            Priority: Minor
>         Attachments: distributed-search-uri.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  
> This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.