Root cause analysis and forensics in interdomain routing: models,  methodologies and tools

Refice, Tiziana

Please use this identifier to cite or link to this item: http://hdl.handle.net/2307/514

Title:	Root cause analysis and forensics in interdomain routing: models, methodologies and tools
Authors:	Refice, Tiziana
Advisor:	Di Battista, Giuseppe
metadata.dc.contributor.referee:	Karrenberg, Daniel Ventre, Giorgio
Issue Date:	2-Apr-2009
Publisher:	Università degli studi Roma Tre
Abstract:	The Internet is an interconnection of administrative domains called Autonomous Systems (ASes). Each AS contains one or multiple destination networks and each network is identified by an IP prefix. The Border Gateway Protocol (BGP) [RLH06] is the de-facto standard routing protocol used to exchange reachability information among ASes and a BGP session between two distinct ASes is called peering. Each AS learns through BGP its "best" route towards each destination in the Internet, updates it in response to network events (e.g., link failures, router resets, or policy changes) and propagates the change by BGP messages called updates. The propagation of BGP updates can be par- tially controlled via routing policy specifications. In order to investigate the Internet behavior over time, several repositories provide historical data. Since 1997 and 1999, respectively, the University of Oregon RouteViews Project (RV ) [roua] and the RIPE NCC Routing Infor- mation Service (RIS) [roub] spread worldwide passive monitors (or vantage points), which continuously gather BGP routing data from the Internet, per- manently store them and make them publicly available. Currently, there are about 800 such monitors. Also, in 1995 the Internet Routing Registry (IRR) was established and started collecting inter-AS routing policies of many of the networks in the Internet with the main purpose to promote stability, consis- tency, and security of the global interdomain routing. As the Internet becomes a more and more critical infrastructure, the need for understanding and (at least at some extent) controlling the interdomain routing increases. Internet Service Providers (ISPs) - in order to improve the quality of service offered to their customers - want to monitor the reachability of specific prefixes, check the effectiveness of their own routing policies, and assess the impact of traffic engineering configurations. In this context, it is crucial to be able to detect and debug misconfigurations or faults, in order to possibly fix them. More generally, the problem of identifying Internet events, locating their root causes, and understanding their dynamics is attracting increasing attention from both researchers and network operators. However, despite the large amount of research effort, routing dynamics di- agnosis remains very difficult for several reasons: (i) The system has a sheer size. As of December 2008, there are about 280, 000 prefixes and more than 30, 000 Autonomous Systems densely connected between each other. (ii) The Internet is highly dynamic. In fact, RIS' and RV's monitors currently receive an average of about 1, 500 BGP updates per minute, with peaks of more than 50, 000 updates per minute. (iii) Due to complex interconnects among ASes and routing policies, the effects of network events are often separated (both in time and space) from their causes and different vantage points record different data in response to the same routing changes. Also, multiple routing events can occur simultaneously. Overall, given such size and dynamics, "naive" ap- proaches to extract relevant information from the Internet routing data are neither effective nor efficient. Therefore, both researchers and network operators interested in understand- ing the interdomain routing have to cope with several major challenges. First, in order to deal with such a huge and complex network, they need to define what to measure, i.e., they need a model of the Internet routing that captures the main dynamics, filtering out the "noise" (e.g., routing changes that do not provide information relevant to the identification of network events). Based on such model, they need a methodology that, given the currently available data sources, detects network events and infers when and where they hap- pened. Furthermore, they need tools that efficiently handle the huge amount of data, support the analysis of the network behavior over time, and provide real-time information in order to spot and possibly fix outages as soon as they occur. Since the analysis of network events often requires manual work, ef- fective paradigms for the visualization of routing data are also very helpful. Previous works leave most of these problems still open. The research work described throughout this thesis addresses these prob- lems and proposes approaches to (at least partially) solve them. Namely, this thesis presents the following contributions. This thesis illustrates a new perspective to drive the analysis of the Internet dynamics without getting lost in the huge BGP dataset. Basically, while previ- ous works usually address the root cause analysis from a "global perspective" - i.e., by taking into account the dynamics of the whole Internet and trying to identify major events affecting it - this thesis tackles the same problem with an ISP-oriented approach: it assumes that ISPs are usually more interested in the reachability of their own prefixes, rather than in the status of the whole Network; hence, it focuses the analysis on user-specified prefixes and corre- lates their behaviors to the global Internet dynamics. In particular, this thesis formally models the Internet as a flow-based system, where monitors are the sources of the flows and ASes originating BGP updates are the sinks. This thesis also defines a methodology which correlates such flow variations to rout- ing changes in order to spot network events and the root causes that triggered them. Furthermore, BGPath has been developed to support this methodol- ogy and this thesis describes its main features. BGPath is a publicly available tool that uses BGP data collected by the RIS and the RV projects and pro- vides the user with routing information from both a single and cross-vantage point views. BGPath also assesses the reliability of the collection system, in order to avoid measurement artifacts. The algorithms BGPath relies on are shown to efficiently process huge streams of BGP data, fulfilling nearly-real time constraints. While the ISP-oriented approach presented in this thesis gives a good in- sight on both major and minor events affecting specific portions of the Internet, approaching the root cause analysis problem from a "global perspective" usu- ally does not provide with such fine-grained results. On the other hand, the global approach is critical to identify major interdomain events, without any a- priori knowledge of the prefixes and/or the ASes involved. This thesis explores this perspective too. Specifically, this thesis proposes a novel methodology based on the Principal Component Analysis (PCA), a well-known statistical technique that is commonly used to reduce the number of dimensions of multi- dimensional datasets in order to highlight the most significant trends of the data. Since the interdomain routing dataset is inherently multi-dimensional (in time, space, prefixes, observation points, ...), this thesis suggests to apply the PCA to this dataset in order to identify the most significant contributors to the Internet dynamics. BGP data collected by RIS' and RV's monitors provide a detailed view of the actual status of the interdomain routing. However, it does not report all the inter-AS peering relationships which are not active. For example, in "normal" conditions, backup links do not appear in the routing tables. Still, in order to understand the reasons behind some network events and to pre- dict the evolution of the routing when an event occurs, such information is actually very important. To cope with the intrinsic limitations of the RIS and RV dataset, this thesis analyzes the data stored in the Internet Routing Reg- istry and describes how to extract peering relationships from routing policies collected within. Moreover, the proposed approach specifies how to solve in- consistencies among the distinct databases the IRR consists of. The obtained results show that - even though the IRR data is often out-of-date, it still pro- vides a quite unique amount of topological information which usually does not appear in the global routing. The research work described in the thesis relies on the assumption that Internet is a graph where ASes are atomic entities in the interdomain rout- ing. However, recent papers [MFM+ 06, MUF+ 07] show that such a model can mislead the understanding of the global routing behavior. Thus, this thesis in- vestigates this problem by measuring the route diversity that can be observed by passive remote vantage points, defining a methodology to compute it from a dynamic BGP dataset and characterizing it in terms of location of ASes in the Internet customer-provider hierarchy and choice of monitors. The thesis documents forensic analysis of two well-know events that oc- curred at the beginning of 2007, where models, methodologies and tools de- scribed in the thesis are exemplified using real case studies.
URI:	http://hdl.handle.net/2307/514
Appears in Collections:	X_Dipartimento di Informatica e automazione T - Tesi di dottorato

Files in This Item:

File	Description	Size	Format
RootCauseAnalysisAndForensicsInInterdomainRouting_TizianaRefice_phdthesis.pdf		11.94 MB	Adobe PDF	View/Open

Show full item record Recommend this item

Page view(s)

212

checked on Nov 21, 2024

Download(s)

134

checked on Nov 21, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM