why doesn't nutch fetch any job links?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

why doesn't nutch fetch any job links?

savannah_beckett
I am trying to get nutch to fetch all the job links in the following link, but
it never does even though it fetches other links in the following link.
http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=


I used all default setting except that I set it to fetch internal link and also
external link.  Does anyone know why?
Thanks.


Reply | Threaded
Open this post in threaded view
|

Re: why doesn't nutch fetch any job links?

Alex McLintock
Have you checked the regular expression filters? I believe that by
default it excludes anything with a '?' in the name because that
implies parameters - which may be unecessary.

Of course for you they presumably are necessary.

Alex


On 5 August 2010 07:02, Savannah Beckett <[hidden email]> wrote:

> I am trying to get nutch to fetch all the job links in the following link, but
> it never does even though it fetches other links in the following link.
> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=
>
>
> I used all default setting except that I set it to fetch internal link and also
> external link.  Does anyone know why?
> Thanks.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: why doesn't nutch fetch any job links?

savannah_beckett
I already made sure it can fetch "?" in the filter.  It can fetch all the links
on the left sidebar, and all of them have "?" as url.  I also made sure that it
can fetch unlimited outlinks.  Any more suggestions? 





________________________________
From: Alex McLintock <[hidden email]>
To: [hidden email]
Sent: Thu, August 5, 2010 12:03:51 AM
Subject: Re: why doesn't nutch fetch any job links?

Have you checked the regular expression filters? I believe that by
default it excludes anything with a '?' in the name because that
implies parameters - which may be unecessary.

Of course for you they presumably are necessary.

Alex


On 5 August 2010 07:02, Savannah Beckett <[hidden email]> wrote:
> I am trying to get nutch to fetch all the job links in the following link, but
> it never does even though it fetches other links in the following link.
>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=
>=
>
>
> I used all default setting except that I set it to fetch internal link and
also
> external link.  Does anyone know why?
> Thanks.
>
>
>