Volunteers requested for Web Spam Classification

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Volunteers requested for Web Spam Classification

Stefan Groschupf-2
Dear Nutch Users,

web spam is a serious issue also for nutch, but in the moment we  
known only  a little bit about the problem and how we can work around.
Please invest some time to help the research community by building a  
collection for future research work.
Details see below.

Thank you.

> At the Algorithmic Engineering group at Universita' di Roma "La  
> Sapienza", we are currently building a reference collection for  
> testing Web Spam detection algorithms. While similar collections  
> for research on e-mail spam filtering exist, there are no publicly  
> available collections for testing Web Spam detection techniques.
> This collection will be freely available once it is completed. We  
> are currently tagging a large subset of 8,000 .UK domains.
> The objective is to classify every domain as spam, normal or  
> suspicious. We are 12 volunteers at this moment and we want to have  
> at least two judges per each classified domain, also, having an  
> heterogeneous group of judges makes the collection more valuable.
> The working time for classifying 100 domains is of about 2 to 3  
> hours. We provide guidelines and examples for the classification  
> task, and an easy to use web-based interface for the volunteers:
> http://aeserver.dis.uniroma1.it/webspam/
> If you, or a colleague or student, can help us in this task, please  
> contact: [hidden email]
> Thank you very much,
> --
> Carlos Castillo, Ph.D.
> Dipartimento di Informatica e Sistemistica Universit√† degli Studi  
> di Roma "La Sapienza"
> Via Salaria 113, II floor
> 00198 Rome, ITALY
> Tel: +39 06 4991 8344
> Fax: +39 06 8530 0849