Problem Extracting HTML Meta Tags

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problem Extracting HTML Meta Tags

z0mbi3
Hi,
I am working on a focused Crawler for which I need the HTML meta tag info in ParseOutputFormat.java. It provides me with the parse of the HTML page so is there a way to Extract the HTML meta tags value through parse.getData?

For Ex. for html page :

<html lang="hi"><head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>BBCHindi.com</title>
<meta name="keywords" content="BBC, Hindi News, Politics">

I would like to extract the keywords content in through the parse.