takes the URI info, Content, headers, ect into a MYSQL database during crawl.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

takes the URI info, Content, headers, ect into a MYSQL database during crawl.

xingjian
Inserting information to MYSQL during crawl.

Instead of writing to disc, Id like to draw content of page and create a method that
takes the URI info, Content, headers, ect into a MYSQL database. Does
anyone have any suggestion on how to do this , where I should look to
place my methods?
Reply | Threaded
Open this post in threaded view
|

Re: takes the URI info, Content, headers, ect into a MYSQL database.

Sagar Naik-2
Hey
AFAIK, FetcherOutputFormat is the class to look at.
the getRecordWriter function,
FILE : new file is opened
DB : Instantiate the db conn

In the RecordWriter class's write function
FILE : Contents are written on disk
DB : insert into db

In the RecordWriter class's close function
FILE : Close file
DB : close file

You will also have to look at ParseOutputFormat along same lines

 

xingjian wrote:
> Instead of writing to disc, Id like to draw content of page and create a
> method that
> takes the URI info, Content, headers, ect into a MYSQL database. Does
> anyone have any suggestion on how to do this , where I should look to
> place my methods?
>
>  


--
This message has been scanned for viruses and
dangerous content and is believed to be clean.

Reply | Threaded
Open this post in threaded view
|

Re: takes the URI info, Content, headers, ect into a MYSQL database.

xingjian
i need to extend FetcherOutputFormat ?Have you simple example ?thanks

Sagar Naik-2 wrote
Hey
AFAIK, FetcherOutputFormat is the class to look at.
the getRecordWriter function,
FILE : new file is opened
DB : Instantiate the db conn

In the RecordWriter class's write function
FILE : Contents are written on disk
DB : insert into db

In the RecordWriter class's close function
FILE : Close file
DB : close file

You will also have to look at ParseOutputFormat along same lines

 

xingjian wrote:
> Instead of writing to disc, Id like to draw content of page and create a
> method that
> takes the URI info, Content, headers, ect into a MYSQL database. Does
> anyone have any suggestion on how to do this , where I should look to
> place my methods?
>
>  


--
This message has been scanned for viruses and
dangerous content and is believed to be clean.