JavaScript Urls

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

JavaScript Urls

lumavanossi
  > Hi,
  >
  >  Anyone here know how to make Nutch read "<a href=javascript(aaa);>" as
  > http://www.myurl.com/one.php?id=aaa ?
  >
  > Thanks in advance.
  >
  > Marco
  >
  >
  >

 

Reply | Threaded
Open this post in threaded view
|

Re: JavaScript Urls

Jérôme Charron
> > Anyone here know how to make Nutch read "<a href=javascript(aaa);>" as
> > http://www.myurl.com/one.php?id=aaa ?

It's a hard issue.
If you just want to map javascript(aaa) to a fix url
www.myurl.com/one.php?id=aaa <http://www.myurl.com/one.php?id=aaa> for all
javascript(aaa) it's quite easy to patch the nutch code to do that.
Otherwise, if you want to resolve such things in a general way, you must
include a javascript interpreter (rhino for instance) in the Nutch's HTML
parser.
It could be a good feature (I've planned to do it in a previous work several
years ago, but didn't done it), but I think it is not easy.
Hope it can help.....

Jerome

--
http://motrech.free.fr/
http://frutch.free.fr/