web spam is a serious issue also for nutch, but in the moment we
known only a little bit about the problem and how we can work around.
Please invest some time to help the research community by building a
collection for future research work.
Details see below.
> VOLUNTEERS - WEB SPAM CLASSIFICATION
> At the Algorithmic Engineering group at Universita' di Roma "La
> Sapienza", we are currently building a reference collection for
> testing Web Spam detection algorithms. While similar collections
> for research on e-mail spam filtering exist, there are no publicly
> available collections for testing Web Spam detection techniques.
> This collection will be freely available once it is completed. We
> are currently tagging a large subset of 8,000 .UK domains.
> The objective is to classify every domain as spam, normal or
> suspicious. We are 12 volunteers at this moment and we want to have
> at least two judges per each classified domain, also, having an
> heterogeneous group of judges makes the collection more valuable.
> The working time for classifying 100 domains is of about 2 to 3
> hours. We provide guidelines and examples for the classification
> task, and an easy to use web-based interface for the volunteers:
> http://aeserver.dis.uniroma1.it/webspam/ >
> If you, or a colleague or student, can help us in this task, please
> contact: [hidden email] >
> Thank you very much,
> Carlos Castillo, Ph.D.
> Dipartimento di Informatica e Sistemistica Università degli Studi
> di Roma "La Sapienza"
> Via Salaria 113, II floor
> 00198 Rome, ITALY
> Tel: +39 06 4991 8344
> Fax: +39 06 8530 0849