[jira] Created: (NUTCH-66) Cookies are not being read properly

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-66) Cookies are not being read properly

Jorge Spinsanti (Jira)
Cookies are not being read properly
-----------------------------------

         Key: NUTCH-66
         URL: http://issues.apache.org/jira/browse/NUTCH-66
     Project: Nutch
        Type: Improvement
  Components: fetcher  
    Reporter: CC Chaman
    Priority: Minor


Cookies that do not begin with a period are not being accepted. For example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to not know the standard. It would be nice if the plugin accepted those cookies as well.


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-66) Cookies are not being read properly

Jorge Spinsanti (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-66?page=comments#action_12315027 ]

Andrzej Bialecki  commented on NUTCH-66:
----------------------------------------

If you are using protocol-httpclient, add the following lines to Http.java, around line 395:

params.setParameter("http.protocol.cookie-policy", CookiePolicy.BROWSER_COMPATIBILITY);
params.setBooleanParameter("http.protocol.single-cookie-header", true);

Please report if this helps.

> Cookies are not being read properly
> -----------------------------------
>
>          Key: NUTCH-66
>          URL: http://issues.apache.org/jira/browse/NUTCH-66
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: CC Chaman
>     Priority: Minor

>
> Cookies that do not begin with a period are not being accepted. For example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to not know the standard. It would be nice if the plugin accepted those cookies as well.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (NUTCH-66) Cookies are not being read properly

Chirag Chaman
Andrzej,

This does NOT work.
Still complains when it sees the domain name without a leading period.
 


-----Original Message-----
From: Andrzej Bialecki (JIRA) [mailto:[hidden email]]
Sent: Monday, July 04, 2005 12:57 PM
To: [hidden email]
Subject: [jira] Commented: (NUTCH-66) Cookies are not being read properly

    [
http://issues.apache.org/jira/browse/NUTCH-66?page=comments#action_12315027
]

Andrzej Bialecki  commented on NUTCH-66:
----------------------------------------

If you are using protocol-httpclient, add the following lines to Http.java,
around line 395:

params.setParameter("http.protocol.cookie-policy",
CookiePolicy.BROWSER_COMPATIBILITY);
params.setBooleanParameter("http.protocol.single-cookie-header", true);

Please report if this helps.

> Cookies are not being read properly
> -----------------------------------
>
>          Key: NUTCH-66
>          URL: http://issues.apache.org/jira/browse/NUTCH-66
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: CC Chaman
>     Priority: Minor

>
> Cookies that do not begin with a period are not being accepted. For
example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to
not know the standard. It would be nice if the plugin accepted those cookies
as well.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (NUTCH-66) Cookies are not being read properly

Chirag Chaman
In reply to this post by Jorge Spinsanti (Jira)
Andrzej,

This does NOT work.
Still complains when it sees the domain name without a leading period.
 


-----Original Message-----
From: Andrzej Bialecki (JIRA) [mailto:[hidden email]]
Sent: Monday, July 04, 2005 12:57 PM
To: [hidden email]
Subject: [jira] Commented: (NUTCH-66) Cookies are not being read properly

    [
http://issues.apache.org/jira/browse/NUTCH-66?page=comments#action_12315027
]

Andrzej Bialecki  commented on NUTCH-66:
----------------------------------------

If you are using protocol-httpclient, add the following lines to Http.java,
around line 395:

params.setParameter("http.protocol.cookie-policy",
CookiePolicy.BROWSER_COMPATIBILITY);
params.setBooleanParameter("http.protocol.single-cookie-header", true);

Please report if this helps.

> Cookies are not being read properly
> -----------------------------------
>
>          Key: NUTCH-66
>          URL: http://issues.apache.org/jira/browse/NUTCH-66
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: CC Chaman
>     Priority: Minor

>
> Cookies that do not begin with a period are not being accepted. For
example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to
not know the standard. It would be nice if the plugin accepted those cookies
as well.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-66) Cookies are not being read properly

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-66?page=comments#action_12316280 ]

Andrzej Bialecki  commented on NUTCH-66:
----------------------------------------

It appears that this was a correct fix, but applied at the wrong level... :-) These are per-method properties - setting them inside HttpResponse fixes the problem.

> Cookies are not being read properly
> -----------------------------------
>
>          Key: NUTCH-66
>          URL: http://issues.apache.org/jira/browse/NUTCH-66
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: CC Chaman
>     Priority: Minor

>
> Cookies that do not begin with a period are not being accepted. For example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to not know the standard. It would be nice if the plugin accepted those cookies as well.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-66) Cookies are not being read properly

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-66?page=all ]
     
Andrzej Bialecki  closed NUTCH-66:
----------------------------------

    Resolution: Fixed

Fixed. The plugin now uses BROWSER_COMPATIBLITY Cookie specification, which should be flexible enough to accomodate most common cookie misformatting. If the problem persists, please reopen this issue - we can continue fixing this by using our own CookiePolicy.

> Cookies are not being read properly
> -----------------------------------
>
>          Key: NUTCH-66
>          URL: http://issues.apache.org/jira/browse/NUTCH-66
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: CC Chaman
>     Priority: Minor

>
> Cookies that do not begin with a period are not being accepted. For example "cnn.com" instead of the RFC ".cnn.com". But A LOT of sites seem to not know the standard. It would be nice if the plugin accepted those cookies as well.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

SVN repo, Where Art Thou? (Re: [jira] Closed: (NUTCH-66) Cookies are not being read properly)

Andrzej Białecki-2
Andrzej Bialecki (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/NUTCH-66?page=all ]
>      
> Andrzej Bialecki  closed NUTCH-66:
> ----------------------------------
>
>     Resolution: Fixed

Just tried to commit the fixes, and svn said it could not find the
repository. I went to svn.apache.org using the browser, and surprise -
the whole /repos is gone. Thinking that maybe I missed a switch to
another directory, I went to /repository, but neither lucene nor nutch
is there. The guys from infrastructure announced that they have finished
their work on CVS/SVN changes, but it seems they missed some parts ...

What now?

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: SVN repo, Where Art Thou? (Re: [jira] Closed: (NUTCH-66) Cookies are not being read properly)

Doug Cutting-2
I doubt you're the only one to notice:

http://monitoring.apache.org/status/

I suspect someone is trying to fix this as quickly as possible.

Doug

Andrzej Bialecki wrote:

> Andrzej Bialecki (JIRA) wrote:
>
>>      [ http://issues.apache.org/jira/browse/NUTCH-66?page=all ]
>>      Andrzej Bialecki  closed NUTCH-66:
>> ----------------------------------
>>
>>     Resolution: Fixed
>
>
> Just tried to commit the fixes, and svn said it could not find the
> repository. I went to svn.apache.org using the browser, and surprise -
> the whole /repos is gone. Thinking that maybe I missed a switch to
> another directory, I went to /repository, but neither lucene nor nutch
> is there. The guys from infrastructure announced that they have finished
> their work on CVS/SVN changes, but it seems they missed some parts ...
>
> What now?
>
Reply | Threaded
Open this post in threaded view
|

Re: SVN repo, Where Art Thou? (Re: [jira] Closed: (NUTCH-66) Cookies are not being read properly)

Michael Ji
The svn server was down last Sunday night also,

Micheal,

--- Doug Cutting <[hidden email]> wrote:

> I doubt you're the only one to notice:
>
> http://monitoring.apache.org/status/
>
> I suspect someone is trying to fix this as quickly
> as possible.
>
> Doug
>
> Andrzej Bialecki wrote:
> > Andrzej Bialecki (JIRA) wrote:
> >
> >>      [
>
http://issues.apache.org/jira/browse/NUTCH-66?page=all

> ]
> >>      Andrzej Bialecki  closed NUTCH-66:
> >> ----------------------------------
> >>
> >>     Resolution: Fixed
> >
> >
> > Just tried to commit the fixes, and svn said it
> could not find the
> > repository. I went to svn.apache.org using the
> browser, and surprise -
> > the whole /repos is gone. Thinking that maybe I
> missed a switch to
> > another directory, I went to /repository, but
> neither lucene nor nutch
> > is there. The guys from infrastructure announced
> that they have finished
> > their work on CVS/SVN changes, but it seems they
> missed some parts ...
> >
> > What now?
> >
>



               
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs 
 
Reply | Threaded
Open this post in threaded view
|

NDFS Requests

webmaster-17
In reply to this post by Andrzej Białecki-2
I have a few requests to make ndfs run better for those of us who plan to use
nutch for mulit-billion page indexes.
1.)redundant namenodes even load balancing namenodes would be nice too

2.)folder structure to the "data" folder so there isnt a single directory
containing thousands of "chunk-files" of data!!! I think the way that squid's
directory setup would work fine for this. The reason I ask this is I have a
couple multi-terabyte arrays that are going to be running nutch if each chunk
is 32mb then I'll have a data directory with 625000 files in!!! it not too
cool if you ever need to browse it, I dont know how well reiser3 will fare
with that, reiser4 should do fine but who's running that yet?

3.)make the namenode function properly without a datanode running on the same
machine. I want to have my most powerfull machines running namenode strictly
to replicate data and handle requests, to achieve a higher io rate.

4.)multi-directory start points, so you dont have to run multiple datanodes
on a machine with more than 1 logical drive, datanode is very processor
intensive running 40 datanodes on 1 machine will even kill a quad processor
machine, but 1 datanodes with 40 starting points to store data will allow the
other processors to do other tasks. round robin will work really good here
for data storage even on computers with different size volumes.

5.)make it scalable to a very large size (petabytes)

6.)Anyone looking for a programming job??? I need someone who knows nutch in
and out, someone who could fix the ndfs problems plus make a couple other
things to do with the webui and plugins for nutch to extend its features.
e-mail me a salary qoute please if you have these skills.
Thanks,
Jay Pound
[hidden email]