Command line integration question

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Command line integration question

Dmitriy Lyubimov
Dear all,

I am testing the command line integration for the SSVD patch in hadoop mode
and running into some difficulties.
Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
configuration is not being picked up.

I do run on CDH3b3, however, all hadoop configuration is 100% compatible
with 0.20. I am using AbstractJob.getConf() to acquire initial properties
but it looks like fs.default.name is not being set still. And i tried to
locate theh code that loads that hadoop conf but wasn't immediately able to
find it. Could you please help me what i need to do to retrieve initial
hadoop configuration correctly? I am missing something very simple here.

Thank you in advance.
-Dmitriy

bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100 -r
200
Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
{--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
--input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
--oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
--tempDir=temp}
Exception in thread "main" java.lang.NullPointerException
        at
org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Reply | Threaded
Open this post in threaded view
|

Re: Command line integration question

Dmitriy Lyubimov
PS. also if I needed to play with various MR settings, such as child
processes arguments, could i pass that on to Configuration object thru a
command line? Or i would have to add a definition for a custom job setting
for every instance where i'd want to supply a custom MR setting?

Thank you in advance.
-Dmitriy



On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <[hidden email]> wrote:

> Dear all,
>
> I am testing the command line integration for the SSVD patch in hadoop mode
> and running into some difficulties.
> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> configuration is not being picked up.
>
> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> but it looks like fs.default.name is not being set still. And i tried to
> locate theh code that loads that hadoop conf but wasn't immediately able to
> find it. Could you please help me what i need to do to retrieve initial
> hadoop configuration correctly? I am missing something very simple here.
>
> Thank you in advance.
> -Dmitriy
>
> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
> -r 200
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> --tempDir=temp}
> Exception in thread "main" java.lang.NullPointerException
>         at
> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Command line integration question

Dmitriy Lyubimov
Ok, i think i got it. Mahout uses standard ToolRunner to preconfigure the
client. Got it. Thanks.

On Sun, Dec 5, 2010 at 3:28 PM, Dmitriy Lyubimov <[hidden email]> wrote:

> PS. also if I needed to play with various MR settings, such as child
> processes arguments, could i pass that on to Configuration object thru a
> command line? Or i would have to add a definition for a custom job setting
> for every instance where i'd want to supply a custom MR setting?
>
>
> Thank you in advance.
> -Dmitriy
>
>
>
> On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <[hidden email]>wrote:
>
>> Dear all,
>>
>> I am testing the command line integration for the SSVD patch in hadoop
>> mode and running into some difficulties.
>> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
>> configuration is not being picked up.
>>
>> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
>> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
>> but it looks like fs.default.name is not being set still. And i tried to
>> locate theh code that loads that hadoop conf but wasn't immediately able to
>> find it. Could you please help me what i need to do to retrieve initial
>> hadoop configuration correctly? I am missing something very simple here.
>>
>> Thank you in advance.
>> -Dmitriy
>>
>> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
>> -r 200
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
>> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
>> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
>> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
>> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
>> --tempDir=temp}
>> Exception in thread "main" java.lang.NullPointerException
>>         at
>> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Command line integration question

Sean Owen
In reply to this post by Dmitriy Lyubimov
fs.default.name is conventionally configured in conf/core-site.xml. This is
environment-specific so it can't really be configured in Mahout code.

(But you can manipulate the Job you get from prepareJob() by calling
getConfiguration()).

You should be able to pass more key-value pairs as you like via JVM system
properties -- that is -- "-Dmapred.input.dir=..."

On Sun, Dec 5, 2010 at 11:28 PM, Dmitriy Lyubimov <[hidden email]> wrote:

> PS. also if I needed to play with various MR settings, such as child
> processes arguments, could i pass that on to Configuration object thru a
> command line? Or i would have to add a definition for a custom job setting
> for every instance where i'd want to supply a custom MR setting?
>
> Thank you in advance.
> -Dmitriy
>
>
>
> On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <[hidden email]>
> wrote:
>
> > Dear all,
> >
> > I am testing the command line integration for the SSVD patch in hadoop
> mode
> > and running into some difficulties.
> > Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> > configuration is not being picked up.
> >
> > I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> > with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> > but it looks like fs.default.name is not being set still. And i tried to
> > locate theh code that loads that hadoop conf but wasn't immediately able
> to
> > find it. Could you please help me what i need to do to retrieve initial
> > hadoop configuration correctly? I am missing something very simple here.
> >
> > Thank you in advance.
> > -Dmitriy
> >
> > bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
> > -r 200
> > Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> > HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> > 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> > {--blockHeight=200, --computeU=true, --computeV=true,
> --endPhase=2147483647,
> > --input=/mahout/ssvdtest/A, --minSplitSize=-1,
> --output=/mahout/ssvd-out/1,
> > --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> > --tempDir=temp}
> > Exception in thread "main" java.lang.NullPointerException
> >         at
> > org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
> >         at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
> >         at
> > org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
> >         at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Command line integration question

Dmitriy Lyubimov
Thank you, Sean.

On Sun, Dec 5, 2010 at 3:38 PM, Sean Owen <[hidden email]> wrote:

> fs.default.name is conventionally configured in conf/core-site.xml. This
> is
> environment-specific so it can't really be configured in Mahout code.
>
> (But you can manipulate the Job you get from prepareJob() by calling
> getConfiguration()).
>
> You should be able to pass more key-value pairs as you like via JVM system
> properties -- that is -- "-Dmapred.input.dir=..."
>
> On Sun, Dec 5, 2010 at 11:28 PM, Dmitriy Lyubimov <[hidden email]>
> wrote:
>
> > PS. also if I needed to play with various MR settings, such as child
> > processes arguments, could i pass that on to Configuration object thru a
> > command line? Or i would have to add a definition for a custom job
> setting
> > for every instance where i'd want to supply a custom MR setting?
> >
> > Thank you in advance.
> > -Dmitriy
> >
> >
> >
> > On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <[hidden email]>
> > wrote:
> >
> > > Dear all,
> > >
> > > I am testing the command line integration for the SSVD patch in hadoop
> > mode
> > > and running into some difficulties.
> > > Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> > > configuration is not being picked up.
> > >
> > > I do run on CDH3b3, however, all hadoop configuration is 100%
> compatible
> > > with 0.20. I am using AbstractJob.getConf() to acquire initial
> properties
> > > but it looks like fs.default.name is not being set still. And i tried
> to
> > > locate theh code that loads that hadoop conf but wasn't immediately
> able
> > to
> > > find it. Could you please help me what i need to do to retrieve initial
> > > hadoop configuration correctly? I am missing something very simple
> here.
> > >
> > > Thank you in advance.
> > > -Dmitriy
> > >
> > > bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p
> 100
> > > -r 200
> > > Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> > > HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> > > 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> > > {--blockHeight=200, --computeU=true, --computeV=true,
> > --endPhase=2147483647,
> > > --input=/mahout/ssvdtest/A, --minSplitSize=-1,
> > --output=/mahout/ssvd-out/1,
> > > --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> > > --tempDir=temp}
> > > Exception in thread "main" java.lang.NullPointerException
> > >         at
> > > org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
> > >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
> > >         at
> > >
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
> > >         at
> > >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
> > >         at
> > >
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >         at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >         at
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> > >
> > >
> >
>