SMARTxAC is a project carried out under a collaboration agreement between the Advanced Broadband Communications Center (CCABA) of the Technical University of Catalonia (UPC) and the Supercomputing Center of Catalonia (CESCA).
SMARTxAC aims to develop and deploy a passive measurement infrastructure and a real-time analysis system for high-speed links. Currently, SMARTxAC is being used for capturing and analyzing the traffic of the Anella Científica (Scientific Ring). The Anella Científica is the name of the Catalan R&D Network, which is managed by CESCA and connects about 50 Universities and Research Centers in Catalonia.
The tapped link is built from a pair of GigE links (one for each traffic direction) that connect the Anella Científica to RedIRIS (Spanish R&D network) and to the global Internet. Current traffic volume on this link is about 600 Mbps and it is increasing day after day, so that data collection is facilitated by an Endace DAG 4.3GE measurement card. Full-traffic analysis at full-line rate is performed in real-time using the SMARTxAC analysis software developed at the Advanced Broadband Communications Center (CCABA) of the UPC.
A three hours GPS-synchronized and anonymized IP header trace was captured for the NLANR/PMA project in February 2004 using the capture point and collection platform in the Anella Científica. This data set was published and can be downloaded at CESCA-I section of NLANR/PMA website.
A new evolution of SMARTxAC, which will be named SMART, is currently under development at CCABA-UPC. SMART is specifically designed to perform at gigabit speeds and will integrate both the capture engine as well as the real-time flow traffic measurement and analysis in the same software. This new platform will be able to perform in real-time at gigabit speeds without packet looses, using only one machine for collection and analysis (measurement box), equipped with one or more Endace measurement cards. Up to now, SMART collection and analysis platform has been tested with up to 2 Gbps of real traffic without experiencing packet looses. Unfortunately, it is not possible to operate at full-line rate in OC192/10GigE links using standard PC equipment if no additional hardware is used for traffic processing and analysis. Although, most of high-performance backbone networks are overprovisioned and designed to use less than its total link capacity, using even load balancing techniques, so that in this case, SMART would be able to perform in real-time in OC192/10GigE networks without using specialized hardware for traffic analysis.
Some results of SMART running on one of NLANR/PMA OC192MON's located on SDSC's TeraGrid Cluster can be found in the section Teragrid-II 10GigE real time analysis of NLANR's website.
Summary of SMART v0.1 features:
Also, SMART will include all traffic analysis features of SMARTxAC soon (see SMARTxAC features below).
The following graphs were generated by SMART running on one of the two OC192MON's at SDSC on March 2004 (positive and negative values indicate incoming and outgoing traffic respectively):
Application timeseries (1 hour, 20 sec. average, bits/sec)
Application timeseries (daily, 5 min. average, tuples/sec)
During last years the CCABA-UPC has been involved in several projects related to traffic measurement in Research and Academic Networks, such as CASTBA, MEHARI and MIRA. As a result of such experience, a traffic-monitoring platform, called SMARTxAC, was wholly developed at CCABA-UPC for permanent monitoring of the Catalan R&D network (Anella Científica). SMARTxAC is a passive traffic measurement system with full-traffic capture and real-time analysis capabilities, provided with a web based graphical interface able to present the traffic analysis results online. The following figure shows the three components of the SMARTxAC platform: the capture system, the traffic analysis system and the result visualization system (the new SMART real-time capture and analysis system will use only one machine for collection, analysis and visualization):
SMARTxAC platform overview
Traffic capture system is based on passive traffic measurement of a Gigabit Ethernet link, using optical splitters. A whole copy of the traffic is collected by a PC equipped with a DAG 4.3GE card and CoralReef suite (CoralReef is going to be replaced shortly by SMART, which is currently under testing stage). In order to analyze all the traffic in real-time, only packet headers are captured and aggregated into flows by CoralReef, which reduces the data volume to be processed by the traffic analysis system.
Traffic analysis system aggregates captured flows into a new kind of flows called classified flows. Aggregation is performed by translating the values identifying a flow (source/destination IP addresses, transport protocol and ports) into more general values (origins, destinations and applications respectively). Origins are considered as the institutions of the monitored network and destinations are the external networks which Anella Científica is connected to (see previous Figure).This process is very simple and can be easily done in real-time without loosing valuable data about network usage. It also allows reduction of data volume and permanent storage, since flow identification key is reduced. Moreover, the traffic that cannot be classified, because its address does not correspond to any institution or its ports are unknown, is classified more detailed in order to detect anomalies. The analysis results for each institution can be viewed online using the result visualization system. This system is based on a dynamical web interface, developed in PHP. The graphs are generated on-demand by external graphic generation tools, such as RRDTool.
With SMARTxAC visualization system is possible to access in real-time via web (graphs are updated every 5 min) to the following graphs for each institution and access point to the Anella Científica, classified in daily, weekly and monthly periods (in bits/sec and packets/sec units):
SMARTxAC measurement platform is also being used to obtain:
Currently, SMARTxAC is reporting detailed information about the Anella Científica usage, which is useful for the network managers to detect irregular (non-academic) usage or network attacks once they occur. Ongoing work is focused on develop an automatic anomaly detection module for SMARTxAC. Such a new module will be based on checking the traffic patterns of the institutions connected to the network and on detecting unexpected changes, which will be assumed to be produced by anomalies in network usage. Once a traffic profile change (anomaly) will be detected, the system will send an alarm to the network manager and will store some additional data concerning the anomaly. Off-line analysis of stored data will be used in order to detect anomaly causes.
The following graphs were generated by SMARTxAC running on the Anella Científica's GigE monitor on April 2004 (positive and negative values indicate incoming and outgoing traffic respectively):
Note: Graphs concerning institutions, access points, networks, etc. are not included here in order to preserve institution confidientally.
Application timeseries (weekly, bits/sec)
Application timeseries (daily, bits/sec)
Application breakdown (daily, bytes)
Destination per Application breakdown (daily, incoming bytes)
Destination breakdown (daily, incoming bytes)
It is worth noting that traffic is classified according to the packet headers (source and destination addresses, protocol and ports). Packet payloads are not captured in order to preserve user confidentiality. Thus, traffic classified as "unknown" is that from applications using unknown ports (ports that do not correspond to any known application). As a consequence, for "unknown" traffic is not possible to establish which application generated that traffic. In most cases, this traffic is generated by P2P applications, FTP transfers in passive mode, tunneling, etc.
Typically, tricky users run P2P applications using unknown ports or ports corresponding to another applications in order to cover up their actions and to cross firewalls. In the same way, traffic classified as known could be generated by a P2P application using ports belonging to another well-known application. On the other hand, often, the identification of specific applications is difficult as tunneling (e.g. HTTP tunneling) is commonly used in the presence of firewalls and usage restrictions. This might also have impact on the performance of classifiers trying to separate different applications/application types.