Improving the reliability of inter-AS economic inferences through a hygiene phase on BGP data

Over the last few years researchers have tried to shed light on the economic features that drive the inter-domain routing of the Internet, by inferring economic inter-AS relationships from raw BGP data collected by research projects such as BGPmon, PCH, RIS and RouteViews. Although this kind of data contains spurious entries mostly caused by router misconfigurations on BGP border routers and showing up during BGP path exploration, none of the methodologies provide an adequate data hygiene phase, thus affecting the accuracy of the inferences drawn. In this paper we outline a new methodology that can purge a large amount of spurious routes from BGP raw data by leveraging on robust statistical concepts rather than on debatable thresholds. To quantify the performance of our methodology we apply an enhanced version of an existing economic tagging algorithm on non-cleaned and cleaned data respectively. We found that 42.01% of different AS paths advertised to BGP route collectors in July 2013 appear only in spurious routes and that, in the absence of an appropriate data hygiene phase, they can affect the accuracy of the economic inferences regarding about 8% of connections found in BGP raw data.

Computer Networks, 2014

