I'm looking for documentation about web repository architectures and
search engines' storage modules in general.
I found Nutch while searching the web, and I congratulate Nutch's
developers on they great work.
I read the available documentation about how Nutch stores crawled
objects both locally and in a distributed way (NDFS), but as part of a
university project I'm looking for more docs about storage even not
Nutch related, and I'm writing here in the hope someone has some good
link to check.
I already read the WebBase paper which studies the possible storage
solutions for a web base  and the Internet Archive ARC file format
and usage  and I'm interested in something like that.
Many thanks to anyone that can help me, and I hope my request doesn't
sound offtopic, as understanding the current state of art of web storage
can help Nutch too.