During fetching, OutlinkExtractor.getOutlinks() finds lots of junk, such as
This is because the defined URL_PATTERN matches things that are not web
links. Is there a fix for it? Is there a way to set protocols (e.g. http,
https) for the desired outlinks? This way, only links containing the
specified protocols will be considered as "outlink". I'm using 0.9-devcode.