resolving IP in...

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

resolving IP in...

Stefan Groschupf-2
Hi,
after playing around to figure out the best place to resolve IP's of  
freshly discovered ulrs I agree with Andrzej that the  
Parseoutputformat isn't the best place.

The problem here, Parseoutputformat  is not multithreaded and we  
definitely need many threads for ip lookup.

I think in case we a  ip Resolving MapRunnable  to preprocess segment  
data (after fetching) before crawldb updating would be may be a  
better place.

+ less data to process (in opposite to process a complete crawldb)
+ good dns cache usage, since many new urls will point to the same  
host (internal links)
- we may lookup urls we already have in the crawldb.

Any thoughts?

Stefan








Reply | Threaded
Open this post in threaded view
|

Re: resolving IP in...

Lourival Júnior
Hi,

I'm new in this mailing list and in use of nutch. I read a lots of things
about nutch. Actually I can do a index and get some queries too. However I
only obtained results in HTML files. I've try to index msdoc and PDF, but I
only can do the index. I have problems with the search. I'm using to search
the application that comes with the nutch. Have anyone the same problem?
Don't repair to my bad english. I'm brazilian... :)

Lourival Junior

--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: resolving IP in...

Lourival Júnior
Anyone knows where can I download the nutch version 0.8? I can't find this
one :(

Att

--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: resolving IP in...

Dennis Kubes
Using subversion http://svn.apache.org/repos/asf/lucene/nutch/

Lourival Júnior wrote:
> Anyone knows where can I download the nutch version 0.8? I can't find
> this
> one :(
>
> Att
>
Reply | Threaded
Open this post in threaded view
|

Re: resolving IP in...

Stefan Neufeind
Using the currently nightly's
http://people.apache.org/dist/lucene/nutch/nightly/


Regards,
 Stefan

Dennis Kubes wrote:
> Using subversion http://svn.apache.org/repos/asf/lucene/nutch/
>
> Lourival Júnior wrote:
>> Anyone knows where can I download the nutch version 0.8? I can't find
>> this one :(
Reply | Threaded
Open this post in threaded view
|

RE: resolving IP in...

Anton Potekhin
In reply to this post by Lourival Júnior

Anyone knows where can I download the nutch version 0.8? I can't find this
one :(


http://svn.apache.org/repos/asf/lucene/nutch/trunk/


Reply | Threaded
Open this post in threaded view
|

Re: resolving IP in...

Dennis Kubes
You have to use the subversion client.  Take a look at the
NutchHadoopTutorial for complete instructions. with eclipse.

http://wiki.apache.org/nutch/NutchHadoopTutorial

Or you can download TortiseSVN (another subversion client) from here.

http://tortoisesvn.tigris.org/

Once you have the client downloaded and installed.  The URL below is the
connect url that takes you right to the most
recent HEAD codebase.

http://svn.apache.org/repos/asf/lucene/nutch/trunk/

This one will show you all of the nutch folder including nightly builds and tagged releases.

http://svn.apache.org/repos/asf/lucene/nutch/

If you are interested in Hadoop you can get it from here the same way.

http://svn.apache.org/repos/asf/lucene/hadoop

Dennis





[hidden email] wrote:
> Anyone knows where can I download the nutch version 0.8? I can't find this
> one :(
>
>
> http://svn.apache.org/repos/asf/lucene/nutch/trunk/
>
>
>