Selecting subdomains to search on

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Selecting subdomains to search on

vanderkerkof
We've got some subdomains I need to index

my url-carwlfilter.txt looks like this

+^http://([a-z0-9]*\.)*.domain.name/

Now I want to search some subdomains, of which there are over 200, but  
not all.

I have a flat file urls/domaintosearch

How do i represent that I want to have some subdomains but not all  
subdomains in this file?

Can I use the same principle of regular expression in this flat file?

Something like


+^http://([a-z0-9]*\.)*.cat.domain.name/
+^http://([a-z0-9]*\.)*.dog.domain.name/
+^http://([a-z0-9]*\.)*apple.domain.name/

Any help, greatly appreciated.
Reply | Threaded
Open this post in threaded view
|

Re: Selecting subdomains to search on

vanderkerkof
OK, you can't, just tried it and it's spewing an error that it's  
skipping the line as it's not a Protocol.

hmm

The problem is I've got alot of sites that come and go that have the  
structure http://differentname.cat.domain.name, and http://differentname.dog.domain.name

Listing them here and maintaining that list will be a pain in the a*s.


On 2 Apr 2008, at 11:03, matt davies wrote:

> We've got some subdomains I need to index
>
> my url-carwlfilter.txt looks like this
>
> +^http://([a-z0-9]*\.)*.domain.name/
>
> Now I want to search some subdomains, of which there are over 200,  
> but not all.
>
> I have a flat file urls/domaintosearch
>
> How do i represent that I want to have some subdomains but not all  
> subdomains in this file?
>
> Can I use the same principle of regular expression in this flat file?
>
> Something like
>
>
> +^http://([a-z0-9]*\.)*.cat.domain.name/
> +^http://([a-z0-9]*\.)*.dog.domain.name/
> +^http://([a-z0-9]*\.)*apple.domain.name/
>
> Any help, greatly appreciated.