Bypass Validation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bypass Validation

Rajasekar Karthik
Hi.

I am trying to crawl a page using nutch. That page exists behinds a validator (struts), i.e. In order to get to the page, a button needs to be clicked. Is there anyway this can be bypassed so web crawler can get to the page without clicking this button?

Code:
<form name="loginForm" method="post" action="/check.do">
      <input type="hidden" name="forward" value="target_page">
       <input type="submit" name="org.apache.struts.taglib.html.CANCEL" value="Continue" onclick="bCancel=true;">
 </form>

Any help is appreciated. Thanks.
Reply | Threaded
Open this post in threaded view
|

RE: Bypass Validation

Patrick Markiewicz
Hi,
   Is there any way that you can create a url that gets beyond that page
without clicking a button?  I.e. can you type something like
http://form.example.com/check.do?forward=target_page&
org.apache.struts.taglib.html.CANCEL=Continue
In a web browser and view the page that is created by hitting the
button?

I'm no nutch expert, but if this button requires cookies to display that
next page, then you may need to use the http-client plugin instead of
the http plugin.  The problem with the http-client plugin is that all of
your original urls need to be escaped.  I.e. in your urls list, you
need:
http%3A//www.google.com
instead of
http://www.google.com

Patrick
-----Original Message-----
From: karthik085 [mailto:[hidden email]]
Sent: Monday, July 14, 2008 5:49 PM
To: [hidden email]
Subject: Bypass Validation


Hi.

I am trying to crawl a page using nutch. That page exists behinds a
validator (struts), i.e. In order to get to the page, a button needs to
be
clicked. Is there anyway this can be bypassed so web crawler can get to
the
page without clicking this button?

Code:
<form name="loginForm" method="post" action="/check.do">
      <input type="hidden" name="forward" value="target_page">
       <input type="submit" name="org.apache.struts.taglib.html.CANCEL"
value="Continue" onclick="bCancel=true;">
 </form>

Any help is appreciated. Thanks.
--

Sent from the Nutch - User mailing list archive at Nabble.com.