Can nutch fit to thi task ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Can nutch fit to thi task ?

ahmed ghouzia
Dear nutchers
  This is my first time that i ask a question to nutch users.
   
  I am a researcher working on web retreval and i am asking if i can use nutch for the following:
   
  1- Can i make nutch begin from a seed urls brought through the Google API ?
   
  2- Can i see the algorithms that make crawling and  compare queries to search results?
   
  3- Can i modify these algorithms and replace them with my own algorithms?
   
   

               
---------------------------------
Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
Reply | Threaded
Open this post in threaded view
|

Re: Can nutch fit to thi task ?

Shawn Gervais
ahmed ghouzia wrote:
> Dear nutchers
>   This is my first time that i ask a question to nutch users.
>    
>   I am a researcher working on web retreval and i am asking if i can use nutch for the following:
>    
>   1- Can i make nutch begin from a seed urls brought through the Google API ?

Yes. You should be able to seed using URLs from any source.

>   2- Can i see the algorithms that make crawling and  compare queries to search results?

Yes. I believe the releases come with source, if not the SVN sources are
available.

>   3- Can i modify these algorithms and replace them with my own algorithms?

Yes.

-Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Can nutch fit to thi task ?

Ravish Bhagdev
The fact that Nutch and Lucene are both open source answer all three of your
questions.  But the thing it dosen't answer is how, thats up to you :)

-- Ravish


On 4/16/06, Shawn Gervais <[hidden email]> wrote:

>
> ahmed ghouzia wrote:
> > Dear nutchers
> >   This is my first time that i ask a question to nutch users.
> >
> >   I am a researcher working on web retreval and i am asking if i can use
> nutch for the following:
> >
> >   1- Can i make nutch begin from a seed urls brought through the Google
> API ?
>
> Yes. You should be able to seed using URLs from any source.
>
> >   2- Can i see the algorithms that make crawling and  compare queries to
> search results?
>
> Yes. I believe the releases come with source, if not the SVN sources are
> available.
>
> >   3- Can i modify these algorithms and replace them with my own
> algorithms?
>
> Yes.
>
> -Shawn
>
Reply | Threaded
Open this post in threaded view
|

RE: Can nutch fit to thi task ?

Howie Wang
In reply to this post by ahmed ghouzia
Although a couple of people mentioned that you can do this
since Nutch is open source, I'd like to play devil's advocate
and say that it is difficult to do #3.

Although you can make little tweaks pretty easily like
boosting words in the title or URL, changing the main
crawling algorithm and/or searching algorithm requires
lots of changes to core code. If you change it, it will
be difficult to merge future changes into your code.

You can definitely do it though. You should just know
what you're getting into.

Howie

>Dear nutchers
>   This is my first time that i ask a question to nutch users.
>
>   I am a researcher working on web retreval and i am asking if i can use
>nutch for the following:
>
>   1- Can i make nutch begin from a seed urls brought through the Google
>API ?
>
>   2- Can i see the algorithms that make crawling and  compare queries to
>search results?
>
>   3- Can i modify these algorithms and replace them with my own
>algorithms?
>
>
>
>
>---------------------------------
>Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo!
>Messenger with Voice.


Reply | Threaded
Open this post in threaded view
|

Re: Can nutch fit to thi task ?

Thomas Delnoij-3
I disagree that it should be difficult to stay uptodate with the main
codeline if you have a lot of local changes.

You can put your code under local version control in subversion and
then use the process described in the "Vendor branches" chapter of the
subversion book (found here:
http://svnbook.red-bean.com/nightly/en/svn.advanced.vendorbr.html) to
stay uptodate with Nutch SVN.

I upgraded my local version with several plugins and tweaks from 0.7.1
through 0.7.2 to the latest 0.8 nightly build this way without major
problems.

Good luck.

Rgrds, Thomas

On 4/17/06, Howie Wang <[hidden email]> wrote:

> Although a couple of people mentioned that you can do this
> since Nutch is open source, I'd like to play devil's advocate
> and say that it is difficult to do #3.
>
> Although you can make little tweaks pretty easily like
> boosting words in the title or URL, changing the main
> crawling algorithm and/or searching algorithm requires
> lots of changes to core code. If you change it, it will
> be difficult to merge future changes into your code.
>
> You can definitely do it though. You should just know
> what you're getting into.
>
> Howie
>
> >Dear nutchers
> >   This is my first time that i ask a question to nutch users.
> >
> >   I am a researcher working on web retreval and i am asking if i can use
> >nutch for the following:
> >
> >   1- Can i make nutch begin from a seed urls brought through the Google
> >API ?
> >
> >   2- Can i see the algorithms that make crawling and  compare queries to
> >search results?
> >
> >   3- Can i modify these algorithms and replace them with my own
> >algorithms?
> >
> >
> >
> >
> >---------------------------------
> >Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo!
> >Messenger with Voice.
>
>
>