Saving Metadata to Mysql

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Saving Metadata to Mysql

mikeyc
Hey all,
I have writen a custom HTML parser and indexer.  I would like to save some information that I have gathered during the parse in a Mysql DB.  I imagine there could be some performance hit here (e.g. connecting to db).  What's the best place to add code to save this information - the parser or the indexer?

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Saving Metadata to Mysql

mikeyc
Any thoughts?
Reply | Threaded
Open this post in threaded view
|

Re: Saving Metadata to Mysql

John Reidy-2
In reply to this post by mikeyc
I am looking at something similar.

I would guess the place to put it is the indexer. As I understand it the
parser runs for just about everything fetched, however the indexer is
only run for pages you want to index.
I am also looking at having static objects (Eg a connection) that is
initialise when the plugin is loaded, ideally through the startup method.

Regards

John

>Hey all,
>I have writen a custom HTML parser and indexer.  I would like to save some
>information that I have gathered during the parse in a Mysql DB.  I imagine
>there could be some performance hit here (e.g. connecting to db).  What's
>the best place to add code to save this information - the parser or the
>indexer?
>
>-Mike
>--
>View this message in context: http://www.nabble.com/Saving-Metadata-to-Mysql-t1389216.html#a3732992
>Sent from the Nutch - User forum at Nabble.com.
>
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: Saving Metadata to Mysql

sudhendra seshachala
Sorry to just jumpping in.
We have doc id associated when we index.  We could store the doc id in mysql table.We could use the docid to query the nutch database..
When parsing, capture things needed as part of "metadata"
Index the metadata. the docId associated is stored in mysql.

Does that give any idea ?...
Please do share your concerns. I am working on a similar stuff where eventually we have to adopt a database.

Thanks



John Reidy <[hidden email]> wrote: I am looking at something similar.

I would guess the place to put it is the indexer. As I understand it the
parser runs for just about everything fetched, however the indexer is
only run for pages you want to index.
I am also looking at having static objects (Eg a connection) that is
initialise when the plugin is loaded, ideally through the startup method.

Regards

John

>Hey all,
>I have writen a custom HTML parser and indexer.  I would like to save some
>information that I have gathered during the parse in a Mysql DB.  I imagine
>there could be some performance hit here (e.g. connecting to db).  What's
>the best place to add code to save this information - the parser or the
>indexer?
>
>-Mike
>--
>View this message in context: http://www.nabble.com/Saving-Metadata-to-Mysql-t1389216.html#a3732992
>Sent from the Nutch - User forum at Nabble.com.
>
>  
>




  Sudhi Seshachala
  http://sudhilogs.blogspot.com/
   


               
---------------------------------
How low will we go? Check out Yahoo! Messenger’s low  PC-to-Phone call rates.
Reply | Threaded
Open this post in threaded view
|

Re: Saving Metadata to Mysql

Stefan Groschupf-2
Depends what you are planing to do, nutch 0.8 support meta data that  
is very flexible (key value tuples) and fast.
Also you can store information in parseData.getMetaData, these will  
be available until indexing as well.



Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:

> Sorry to just jumpping in.
> We have doc id associated when we index.  We could store the doc id  
> in mysql table.We could use the docid to query the nutch database..
> When parsing, capture things needed as part of "metadata"
> Index the metadata. the docId associated is stored in mysql.
>
> Does that give any idea ?...
> Please do share your concerns. I am working on a similar stuff  
> where eventually we have to adopt a database.
>
> Thanks
>
>
>
> John Reidy <[hidden email]> wrote: I am looking at  
> something similar.
>
> I would guess the place to put it is the indexer. As I understand  
> it the
> parser runs for just about everything fetched, however the indexer is
> only run for pages you want to index.
> I am also looking at having static objects (Eg a connection) that is
> initialise when the plugin is loaded, ideally through the startup  
> method.
>
> Regards
>
> John
>
>> Hey all,
>> I have writen a custom HTML parser and indexer.  I would like to  
>> save some
>> information that I have gathered during the parse in a Mysql DB.  
>> I imagine
>> there could be some performance hit here (e.g. connecting to db).  
>> What's
>> the best place to add code to save this information - the parser  
>> or the
>> indexer?
>>
>> -Mike
>> --
>> View this message in context: http://www.nabble.com/Saving- 
>> Metadata-to-Mysql-t1389216.html#a3732992
>> Sent from the Nutch - User forum at Nabble.com.
>>
>>
>>
>
>
>
>
>   Sudhi Seshachala
>   http://sudhilogs.blogspot.com/
>
>
>
>
> ---------------------------------
> How low will we go? Check out Yahoo! Messenger’s low  PC-to-Phone  
> call rates.

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply | Threaded
Open this post in threaded view
|

Reading data from mysql (was Saving Metadata to Mysql)

John Reidy-2
Sorry for the delay in replying.

v 0.8 looks very interesting. In time we will store more and more of our
meta data in Nutch.

The aspect that I am looking at is configuring a flexible fetcher that
will read from a variety of sources (eg rdbms and apps running on top of
a database) and then index and make available this information.
The hardest part is working around the java.net class (which for reasons
I can appreciate) cannot be subclassed.

The app I have in mind is a webservice type (document management app) so
all of the calls are urls anyway.

Regards

John

Stefan Groschupf wrote:

> Depends what you are planing to do, nutch 0.8 support meta data that  
> is very flexible (key value tuples) and fast.
> Also you can store information in parseData.getMetaData, these will  
> be available until indexing as well.
>
>
>
> Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:
>
>> Sorry to just jumpping in.
>> We have doc id associated when we index.  We could store the doc id  
>> in mysql table.We could use the docid to query the nutch database..
>> When parsing, capture things needed as part of "metadata"
>> Index the metadata. the docId associated is stored in mysql.
>>
>> Does that give any idea ?...
>> Please do share your concerns. I am working on a similar stuff  where
>> eventually we have to adopt a database.
>>
>> Thanks
>>
>>
>>
>> John Reidy <[hidden email]> wrote: I am looking at  something
>> similar.
>>
>> I would guess the place to put it is the indexer. As I understand  it
>> the
>> parser runs for just about everything fetched, however the indexer is
>> only run for pages you want to index.
>> I am also looking at having static objects (Eg a connection) that is
>> initialise when the plugin is loaded, ideally through the startup  
>> method.
>>
>> Regards
>>
>> John
>>
>>> Hey all,
>>> I have writen a custom HTML parser and indexer.  I would like to  
>>> save some
>>> information that I have gathered during the parse in a Mysql DB.   I
>>> imagine
>>> there could be some performance hit here (e.g. connecting to db).  
>>> What's
>>> the best place to add code to save this information - the parser  or
>>> the
>>> indexer?
>>>
>>> -Mike
>>> --
>>> View this message in context: http://www.nabble.com/Saving- 
>>> Metadata-to-Mysql-t1389216.html#a3732992
>>> Sent from the Nutch - User forum at Nabble.com.
>>>
>>>
>>>
>>
>>
>>
>>
>>   Sudhi Seshachala
>>   http://sudhilogs.blogspot.com/
>>
>>
>>
>>        
>> ---------------------------------
>> How low will we go? Check out Yahoo! Messenger’s low  PC-to-Phone  
>> call rates.
>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>

vis
Reply | Threaded
Open this post in threaded view
|

Re: Reading data from mysql (was Saving Metadata to Mysql)

vis
Sorry, I am on holiday until the 8th of May.

Please contact the [hidden email] for urgent matters.

Kind regards, Herman.