Does anyone know of a way to filter by mime-type in the initial server
We are only interested in html and we have a problem where URLs provide no
indication that they are files, e.g. PDF,RSS,XML etc
Ideally I'd want the fetcher to make the request to the server and abandon
fetching based on some sort of blacklist/whitelist of mime-types.