The expertise in the design and analysis of algorithms, graph theory and information retrieval developed by our  group has been applied to improving aspects of the World Wide Web (WWW) Technology. Internet is a distributed system and the applications built on it (such as the WWW) have a natural interpretation as a graph whose nodes represent web pages, or web sites, and the arcs are the navigational hyper-links. This view abstracts from many low level details of single computers and allows to detect global phenomena.  The use of advanced techniques for the collection of data has been our object of study. Specifically a new crawling software was designed and implemented, which allows us to use a distributed computing infrastructure and different crawling strategies  according to the characteristics of the type of  web image that we seek.  Moreover, issues related to social networks have been considered.  In this framework, algorithms have been proposed for the measurement of the relative importance of the nodes and for the identification of significant structures  (e. g. dense subgraphs). These algorithms need to perform  effectively in a  “big data”  context. The success of  Information Retrieval applied to study the web has stimulated the development of new methods for unsupervised or semi-supervised classification  of web content. The unsupervised model was  investigated for the problem of spam sites identification, while the semi-supervised model has been studied in relation to the problem of semantic labeling of Web sites.

Research theme: Algorithms and Computational Mathematics

