crawling - Skip only few pages with certain/special characters in urls

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

crawling - Skip only few pages with certain/special characters in urls

Rajasekar Karthik
Hi,

I would like to ignore these urls (Ones with 'GO:') from crawling:
http://domain.com/NEW-IMAGE?object=GO:0005737

I added different variants as described below in my crawl-urlfilter.txt (using 'crawl' command to crawl) & tested. But, these type of pages still gets fetched.

Variant #1:
-GO:

Variant #2:
-GO:.*

Variant #3
-object=GO

Another variant I also tried is  - all of the above variants with double-quotation marks starting after '-' and ending after the last character. EG: -"GO:"

I CANNOT even add '=' to '# skip URLs containing certain characters as probable queries, etc.'
-[*!@]
as there are other pages with '=' that needs to be fetched.

Any help is appreciated.

Thanks,
Karthik