I am using MapReduce and DFS for a crawl + index operation. When parsing
segments (about 50,000 - 60,000 URLs), everything goes fine. But, when I try
to parse a larger segment
(600,000 - 700,000 URLs), my job is stopped by OutOfMemoryError at
tasktrackers during the map phase :
"java.lang.OutOfMemoryError: Java heap space"
Is this an expected situation as the segments grow larger or is this a bug
waiting to be examined?
I have been trying to solve the problem, but I could not achieve it. Could
somebody help me?