Fetcher, using parse.getText as digest value

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fetcher, using parse.getText as digest value

Jon Shoberg

The file src/java/org/apache/nutch/fetcher/Fetcher.java has the
following lines

-------------------------------------------------------------------
260       if (status.isSuccess()) {
261         outputPage(new FetcherOutput(fle, hash, protocolStatus),
262                 content, new ParseText(parse.getText()),
parse.getData());
263       }
-------------------------------------------------------------------

where hash is

-------------------------------------------------------------------
233       Content content = output.getContent();
234       MD5Hash hash = null;
235       String url = fle.getPage().getURL().toString();
236       if (content == null) {
237         content = new Content(url, url, new byte[0], "", new
Properties());
238         hash = MD5Hash.digest(url);
239       } else {
240         hash = MD5Hash.digest(content.getContent());
241       }
-------------------------------------------------------------------

Its a little late right now and perhaps I'm asking a nieve questions, if
the parse is successful on non-null content, what would be the
by-product of changing the content hash from

hash = MD5Hash.digest(content.getContent());

to the hash being the MD5Digest of parse.getText().

Thoughts?