New Nutch User

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

New Nutch User

Webmaster-37
I am new to Nutch.  I love it, but am not sure if I can handle putting this
together by myself.  I run Red Hat Linux boxes with apache.  I have
knowledge of HTML, some Java, MYSQL, PHP and Linux.  

 

Will I be able to get Nutch up and running to crawl multiple sites on the
internet the way a basic search engine does?  What skills am I missing or
need to learn?  

 

My main problem, is that I am pretty confident that I can get Nutch
installed on my machines, but I'm not too sure how to integrate it into the
front end of my site.  Is it just a simple POST or GET form, or is it very
JAVA intensive?

 

Any suggestions would be greatly appreciated.

 

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: New Nutch User

Nguyen Ngoc Giang
I think Nutch 0.7 supports OpenSearch protocol, so that you don't need to
digest much on Java code. Just treat Nutch as a web service, and you can
write wrapper on any scripting language that you love to handle HTTP
requests/responses.


On 10/13/05, [hidden email] <[hidden email]>
wrote:

>
> I am new to Nutch. I love it, but am not sure if I can handle putting this
> together by myself. I run Red Hat Linux boxes with apache. I have
> knowledge of HTML, some Java, MYSQL, PHP and Linux.
>
>
>
> Will I be able to get Nutch up and running to crawl multiple sites on the
> internet the way a basic search engine does? What skills am I missing or
> need to learn?
>
>
>
> My main problem, is that I am pretty confident that I can get Nutch
> installed on my machines, but I'm not too sure how to integrate it into
> the
> front end of my site. Is it just a simple POST or GET form, or is it very
> JAVA intensive?
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: New Nutch User

Webmaster-37
Thanks so much for your help.  One more question I've never written a
wrapper before.  I did some searching online and found SWIG
(http://www.swig.org) which seems like it can help me write a wrapper.

Does anyone have some examples of a wrapper I can use, or will SWIG be my
best bet?  Ultimately my goal for Nutch is to create a site similar to
Indeed.com.  Any suggestions would be greatly appreciated.  Thanks!

-----Original Message-----
From: Ngoc Giang Nguyen [mailto:[hidden email]]
Sent: Wednesday, October 12, 2005 1:31 PM
To: [hidden email]
Subject: Re: New Nutch User

I think Nutch 0.7 supports OpenSearch protocol, so that you don't need to
digest much on Java code. Just treat Nutch as a web service, and you can
write wrapper on any scripting language that you love to handle HTTP
requests/responses.


On 10/13/05, [hidden email] <[hidden email]>
wrote:

>
> I am new to Nutch. I love it, but am not sure if I can handle putting this
> together by myself. I run Red Hat Linux boxes with apache. I have
> knowledge of HTML, some Java, MYSQL, PHP and Linux.
>
>
>
> Will I be able to get Nutch up and running to crawl multiple sites on the
> internet the way a basic search engine does? What skills am I missing or
> need to learn?
>
>
>
> My main problem, is that I am pretty confident that I can get Nutch
> installed on my machines, but I'm not too sure how to integrate it into
> the
> front end of my site. Is it just a simple POST or GET form, or is it very
> JAVA intensive?
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: New Nutch User

Fuad Efendi
Nutch does support A9's OpenSearch extensions to RSS.

I think, it would be easier to start with pure Nutch, then to learn some
JSP/Servlet... If you need own crawler...


-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Thursday, October 13, 2005 1:33 PM
To: [hidden email]
Subject: RE: New Nutch User


Thanks so much for your help.  One more question I've never written a
wrapper before.  I did some searching online and found SWIG
(http://www.swig.org) which seems like it can help me write a wrapper.

Does anyone have some examples of a wrapper I can use, or will SWIG be my
best bet?  Ultimately my goal for Nutch is to create a site similar to
Indeed.com.  Any suggestions would be greatly appreciated.  Thanks!

-----Original Message-----
From: Ngoc Giang Nguyen [mailto:[hidden email]]
Sent: Wednesday, October 12, 2005 1:31 PM
To: [hidden email]
Subject: Re: New Nutch User

I think Nutch 0.7 supports OpenSearch protocol, so that you don't need to
digest much on Java code. Just treat Nutch as a web service, and you can
write wrapper on any scripting language that you love to handle HTTP
requests/responses.


On 10/13/05, [hidden email] <[hidden email]>
wrote:

>
> I am new to Nutch. I love it, but am not sure if I can handle putting
> this together by myself. I run Red Hat Linux boxes with apache. I have
> knowledge of HTML, some Java, MYSQL, PHP and Linux.
>
>
>
> Will I be able to get Nutch up and running to crawl multiple sites on
> the internet the way a basic search engine does? What skills am I
> missing or need to learn?
>
>
>
> My main problem, is that I am pretty confident that I can get Nutch
> installed on my machines, but I'm not too sure how to integrate it
> into the front end of my site. Is it just a simple POST or GET form,
> or is it very JAVA intensive?
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] RE: New Nutch User

Otis Gospodnetic-2-2
I believe that's incorrect.  As a matter of fact, there were patches
for fixing character-encoding problems with this feature coming in
through JIRA just the other day.

Otis


--- Fuad Efendi <[hidden email]> wrote:

> Nutch does support A9's OpenSearch extensions to RSS.
>
> I think, it would be easier to start with pure Nutch, then to learn
> some
> JSP/Servlet... If you need own crawler...
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]]
> Sent: Thursday, October 13, 2005 1:33 PM
> To: [hidden email]
> Subject: RE: New Nutch User
>
>
> Thanks so much for your help.  One more question I've never written a
> wrapper before.  I did some searching online and found SWIG
> (http://www.swig.org) which seems like it can help me write a
> wrapper.
>
> Does anyone have some examples of a wrapper I can use, or will SWIG
> be my
> best bet?  Ultimately my goal for Nutch is to create a site similar
> to
> Indeed.com.  Any suggestions would be greatly appreciated.  Thanks!
>
> -----Original Message-----
> From: Ngoc Giang Nguyen [mailto:[hidden email]]
> Sent: Wednesday, October 12, 2005 1:31 PM
> To: [hidden email]
> Subject: Re: New Nutch User
>
> I think Nutch 0.7 supports OpenSearch protocol, so that you don't
> need to
> digest much on Java code. Just treat Nutch as a web service, and you
> can
> write wrapper on any scripting language that you love to handle HTTP
> requests/responses.
>
>
> On 10/13/05, [hidden email]
> <[hidden email]>
> wrote:
> >
> > I am new to Nutch. I love it, but am not sure if I can handle
> putting
> > this together by myself. I run Red Hat Linux boxes with apache. I
> have
> > knowledge of HTML, some Java, MYSQL, PHP and Linux.
> >
> >
> >
> > Will I be able to get Nutch up and running to crawl multiple sites
> on
> > the internet the way a basic search engine does? What skills am I
> > missing or need to learn?
> >
> >
> >
> > My main problem, is that I am pretty confident that I can get Nutch
>
> > installed on my machines, but I'm not too sure how to integrate it
> > into the front end of my site. Is it just a simple POST or GET
> form,
> > or is it very JAVA intensive?
> >
> >
> >
> > Any suggestions would be greatly appreciated.
> >
> >
> >
> > Thanks
> >
> >
> >
>
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads,
> discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> Nutch-general mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>

Reply | Threaded
Open this post in threaded view
|

RE: New Nutch User

Webmaster-37
In reply to this post by Nguyen Ngoc Giang
I found this Perl wrapper example:
http://opensearch.a9.com/docs/openSearchPerlWrapper.txt 

Can anyone tell me if something like this would work with Nutch?


-----Original Message-----
From: Ngoc Giang Nguyen [mailto:[hidden email]]
Sent: Wednesday, October 12, 2005 1:31 PM
To: [hidden email]
Subject: Re: New Nutch User

I think Nutch 0.7 supports OpenSearch protocol, so that you don't need to
digest much on Java code. Just treat Nutch as a web service, and you can
write wrapper on any scripting language that you love to handle HTTP
requests/responses.


On 10/13/05, [hidden email] <[hidden email]>
wrote:

>
> I am new to Nutch. I love it, but am not sure if I can handle putting this
> together by myself. I run Red Hat Linux boxes with apache. I have
> knowledge of HTML, some Java, MYSQL, PHP and Linux.
>
>
>
> Will I be able to get Nutch up and running to crawl multiple sites on the
> internet the way a basic search engine does? What skills am I missing or
> need to learn?
>
>
>
> My main problem, is that I am pretty confident that I can get Nutch
> installed on my machines, but I'm not too sure how to integrate it into
> the
> front end of my site. Is it just a simple POST or GET form, or is it very
> JAVA intensive?
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks
>
>
>

Reply | Threaded
Open this post in threaded view
|

how to build a SE based on nutch

Heart
In reply to this post by Otis Gospodnetic-2-2

I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6.
and I've spend two week to read the source code of nutch 0.6.

Now I want to build a bigger one. I want to crawl the pages from several website I specific.
My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps
I want to provide a search service about some specific domain. so i choose some
big websites, and crawl them.
so my question is :
Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
any advices would be greatly appreciated.



--
Best regards,
 Heart                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: how to build a SE based on nutch

Miguel Paraz
On 10/18/05, Heart <[hidden email]> wrote:
> Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
> and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
> any advices would be greatly appreciated.

I'll add:
Sorry to ask this, but I could not find it in the docs. How could I
request Nutch to refetch sites that are already in the db? I tried
injecting them again, but they are not refreshed.