how to reuse webDB with new urls

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how to reuse webDB with new urls

AJ Chen-2
Once I create a webDB, can I inject new root urls to the same webDB
repeatly? After each injection, run as many cycles of
generate/fetch/updatedb to fetch all web pages from the new sites. I think
this will allow me to gradually build a comprehensive vertical site. Any
comment or suggestion?
-AJ
Reply | Threaded
Open this post in threaded view
|

Re: how to reuse webDB with new urls

Michael Ji
I think this scenario will work.

Just a bit worry about the filter performance if the
domain site number is in scale of thundreds of
thousands.

Michael Ji

--- AJ Chen <[hidden email]> wrote:

> Once I create a webDB, can I inject new root urls to
> the same webDB
> repeatly? After each injection, run as many cycles
> of
> generate/fetch/updatedb to fetch all web pages from
> the new sites. I think
> this will allow me to gradually build a
> comprehensive vertical site. Any
> comment or suggestion?
> -AJ
>



               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: how to reuse webDB with new urls

Jay Lorenzo
What about the issue of maintaining some semblance of ACIDity? Don't you
have to make sure that the generation of fetchlists and the updates are run
synchronously, ie one update or generate at a time?

On 9/13/05, Michael Ji <[hidden email]> wrote:

>
> I think this scenario will work.
>
> Just a bit worry about the filter performance if the
> domain site number is in scale of thundreds of
> thousands.
>
> Michael Ji
>
> --- AJ Chen <[hidden email]> wrote:
>
> > Once I create a webDB, can I inject new root urls to
> > the same webDB
> > repeatly? After each injection, run as many cycles
> > of
> > generate/fetch/updatedb to fetch all web pages from
> > the new sites. I think
> > this will allow me to gradually build a
> > comprehensive vertical site. Any
> > comment or suggestion?
> > -AJ
> >
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>
Reply | Threaded
Open this post in threaded view
|

Re: how to reuse webDB with new urls

AJ Chen
Before re-injecting a new set of urls to webdb, I'll wait until all
fetch operations (generate + fetch + updatedb) are done.  I'm not sure
it's necessary or not, but it's cleaner.

One more question: Should I run UpdateSegmentsFromDb to update the
segments before any new injection?  Does segment updating affect url
injection and fetch list generation?

-AJ

Jay Lorenzo wrote:

>What about the issue of maintaining some semblance of ACIDity? Don't you
>have to make sure that the generation of fetchlists and the updates are run
>synchronously, ie one update or generate at a time?
>
>On 9/13/05, Michael Ji <[hidden email]> wrote:
>  
>
>>I think this scenario will work.
>>
>>Just a bit worry about the filter performance if the
>>domain site number is in scale of thundreds of
>>thousands.
>>
>>Michael Ji
>>
>>--- AJ Chen <[hidden email]> wrote:
>>
>>    
>>
>>>Once I create a webDB, can I inject new root urls to
>>>the same webDB
>>>repeatly? After each injection, run as many cycles
>>>of
>>>generate/fetch/updatedb to fetch all web pages from
>>>the new sites. I think
>>>this will allow me to gradually build a
>>>comprehensive vertical site. Any
>>>comment or suggestion?
>>>-AJ
>>>
>>>      
>>>
>>
>>
>>__________________________________
>>Yahoo! Mail - PC Magazine Editors' Choice 2005
>>http://mail.yahoo.com
>>
>>    
>>
>
>  
>

--
AJ (Anjun) Chen, Ph.D.
Canova Bioconsulting
Marketing * BD * Software Development
748 Matadero Ave., Palo Alto, CA 94306, USA
Cell 650-283-4091, [hidden email]
---------------------------------------------------