IIT Home Page CNR Home Page

A stochasic model for the link analysis of the Web

The behaviour of inlink and outlink distributions appears to be one of the most studied property of the web structure. The literature agrees that the inlink distribution follows a power law, but no such agreement applies to the outlink distribution. Accurate observations show that in the low degree region the link distribution fails to fit a power law with a discrep- ancy larger for outlinks than for inlinks. Moreover a power law, as well as any continuous function, does not fit the scattered behaviour shared by both the link distributions for large degree values. The linking model we consider here is a mixed one, based on both the preferential attachment and the uniform attachment strategy. A new approximation technique is devised to detect the parameters of the steady state solution which describe a real data set. A stochastic technique is suggested to describe the scattering of the data. With these techniques the model appears to be well suited for describing both inlink and outlink distributions. The experimentation on subsets of the real web and of Wikipedia shows that our approach produces an approximation more adequate than the power law. This approximation suggests that the two attachment strategies play a different role in the inlink and the outlink case.

Internet Mathematics, 2007

Autori: P. Favati,G. Lotti,O. Menchi, F. Romani
Autori IIT:

Tipo: Articoli su riviste non ISI con referee internazionali
Area di disciplina: Mathematics
Da pagina 511 a pagina 533

Attività: Metodi numerici per problemi di grandi dimensioni
Algoritmica per tecnologie web