Storage architectures

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Storage architectures

Francesco Cipriani
Hi all,
I'm looking for documentation about web repository architectures and
search engines' storage modules in general.
I found Nutch while searching the web, and I congratulate Nutch's
developers on they great work.
I read the available documentation about how Nutch stores crawled
objects both locally and in a distributed way (NDFS), but as part of a
university project I'm looking for more docs about storage even not
Nutch related, and I'm writing here in the hope someone has some good
link to check.
I already read the WebBase paper which studies the possible storage
solutions for a web base [1] and the Internet Archive ARC file format
and usage [2] and I'm interested in something like that.
Many thanks to anyone that can help me, and I hope my request doesn't
sound offtopic, as understanding the current state of art of web storage
can help Nutch too.

[1] http://dbpubs.stanford.edu:8090/pub/1999-26
[2] http://crawler.archive.org/cgi-bin/wiki.pl?ArcRevisionProposal

Bye.
--
Francesco