nutch server performance

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

nutch server performance

Michael Nebel
Hi everybody,

I have some performance problems running nutch. My scenario: I build a
nutch system with

- about 5 mio indexed (!) documents (measured by luke and over the web)
- segread returns about 10 mio known documents
- there are 58 segments (making about 90.000 indexed documents per
- the segments have all about the same size (each segments takes about
2 GB including the index)
- the indexes haven been merged to one "total index" (9 GB by now)
- one "nutch server" handles the queries
- hardware: Intel Celeron 2,8, 1GB RAM, 250 GB Sata-HD
- the apche/mod_jk/tomcat frontend is on a seperate server

I observer severe performance problems when handeling a load over
1 searches/s. The search within the indexes is pretty quick, but it
takes forever to read the summary (getSummary) from disk. And there
seems to build up some kind of backlog.

The bottleneck seems to be the disk-i/o. So I made some tests with
smaller segments and it get's a little bit  better. Faster disks would
be nice, but I'm afraid it's only a matter of time when I get to the
same problem again. I still search for a konfiguration mistake/problem.

How do you manage you systems? Does anybody have any hints, how to tune
the system?



Michael Nebel