i'm playing around with an app that parses websites and extracts
information, returning certain information to my system.
my primary issue has to do with how i might architect the system to place
the information into my database. i'm using/testing with mysql. my question
has to do with how to scale this kind of system. if i have a server, that's
spawing 100's of apps with each app firing off a web/page connection to a
web server, i'm going to have more than enough connections coming back to
swamp out writing to a mysql server...
so how do other apps/crawlers handle this kind of situation... basically,
i'm trying to figure out how to implement some kind of scaling funneling
process/mechanism to allow me to have 10-20 servers crawling the specific
sites, and returning the information to a database...
any thoughts/comments/pointers on how to deal with this will be helpful!!