A Clustering-based Approach for the Identification of Parked Domains

Parked domains (PDs) are domains whose owners are not interested in using them as gates for their activities but they are kept reserved to be sold in the secondary market of web domains.
To transform the costs of the annual registration fees in an opportunity of revenues, parked domains most often host a large amount of ads in the hope that someone who lands on the site by chance clicks on some ads.
Since parking has become a widespread activity, a large number of specialized companies have come out and made parking a straightforward task that simply requires to set the domain's name servers appropriately.
Although parking is a legal activity, it introduces a big burden for crawling systems and web mining tools. In fact, without filtering parked domains, crawlers could spend a non-negligible part of their time downloading fat web sites whose content can negatively affect the performances of analysis algorithms.
In this paper, we face the problem of compiling the list of the name servers used for domain parking so that they can be discarded before the first connection just after the first DNS query.


Autori IIT:

Giuseppe Cavaleri

Tipo: TR Rapporti tecnici
Area di disciplina: Mathematics
IIT TR-01/2014

File: TR-01-2014.pdf