Nutch asumes that the TLD is no longer than 4 characters, this is being in
the process of being fixed in the next release, which should be out shortly.
On Wed, Aug 8, 2018 at 7:26 PM Robert Scavilla <[hidden email]> wrote:
> Hello and Thank you for helping. For some nutch is rejecting the domain
> https://www.penn.museum/ >
> The regex-urlfilter is: +.
> seeding with https://www.penn.museum/ >
> And on crawl it keeps giving:
> Injector: Total urls rejected by filters: 1
> This is the only time I've had this issue and was wondering if the .museum
> TLD was the problem??