The identification of applications in network traffic has become a prolific research topic during the last years. The classification of the traffic is crucial for classic network management tasks, such as traffic engineering and capacity planning. Traditional techniques relying on transport-level protocol ports are no longer reliable due to the ever-changing nature of Internet traffic and applications and their techniques to avoid the detection (e.g., encryption, obfuscation). As a consequence, researchers are working and proposing a wide range of traffic classification solutions. However, although some proposals achieve high accuracy, the problem is far from being completely solved. The lack of shared tools and reference data makes the comparison and validation of the proposed techniques very difficult. Thus, difficulting the better assesment of the present achievements in this field.
Our group is involved in many projects doing research in the traffic classification field. Our area of research covers many aspects in this field, however, we have special expertise in these topics:
Probably the biggest problem to compare and validate the different techniques proposed for network traffic classification is the lack of publicly available datasets. Mainly because of privacy issues, researchers and practitioners are not allowed to share their datasets with the research community. In order to address, or at least mitigate, this problem, our group is usually publishing the datasets used in their works. Next, the publicly available datasets related to our works are described. Special mention for the "Is our Ground-Truth for Traffic Classification Reliable?" dataset that provides a set of reliably labeled pcap traces with full payload.
This dataset is derived from the paper:
Valentín Carela-Español, Pere Barlet-Ros, Albert Cabellos-Aparicio, and Josep Solé-Pareta: "Analysis of the impact of sampling on NetFlow traffic classification", Computer Networks 55 (2011), pp. 1083-1099. [pdf] [doi]
The traffic classification problem has recently attracted the interest of both network operators and researchers, given the limitations of traditional techniques when applied to current Internet traffic. Several machine learning (ML) methods have been proposed in the literature as a promising solution to this problem. However, very few can be applied to NetFlow data, while fewer works have analyzed their performance under traffic sampling. In this paper, we address the traffic classification problem with Sampled NetFlow, which is a widely extended protocol among network operators, but scarcely investigated by the research community. In particular, we adapt one of the most popular ML methods to operate with NetFlow data and analyze the impact of traffic sampling on its performance.
Our results show that our ML method is able to obtain similar accuracy than previous packet-based methods, but using only the limited information reported by NetFlow. Conversely, our results indicate that the accuracy of standard ML techniques degrades drastically with sampling. In order to reduce this impact, we propose an automatic ML process that does not rely on any human intervention and significantly improves the classification accuracy in the presence of traffic sampling
The evaluation dataset used in the paper "Analysis of the impact of sampling on NetFlow traffic classification" consists of seven traces collected at the Gigabit access link of the Universitat Politècnica de Catalunya (UPC), which connects about 25 faculties and 40 departments (geographically distributed in 10 campuses) to the Internet through the Spanish Research and Education network (RedIRIS).
Name | Flows | Date | Time |
UPC-I | 2 985 098 | 11-12-08 | 10:00 (15 min.) |
UPC-II | 3 369 105 | 11-12-08 | 12:00 (15 min.) |
UPC-III | 3 474 603 | 12-12-08 | 16:00 (15 min.) |
UPC-IV | 3 020 114 | 12-12-08 | 18:30 (15 min.) |
UPC-V | 7 146 336 | 21-12-08 | 16:00 (1 h.) |
UPC-VI | 9 718 077 | 22-12-08 | 12:30 (1 h.) |
UPC-VII | 5 510 999 | 10-03-09 | 03:00 (1 h.) |
The format of the labeled traces available consists of a plain text file similar to a NetFlow v5 flow-print output without IP information and the correspondent application label obtained by L7-Filter.
Pr | SrcP | DstP | Pkts | Octets | StartTime | EndTime | Active | B/Pk | Ts | Fl | Application |
06 | 50 | 114f | 2 | 3000 | 0901.00:59:15.924 | 0901.00:59:17.924 | 2.000 | 1500 | 00 | 10 | skypetoskype |
In order to reduce the inaccuracy of L7-filter we use 3 rules:
We also perform a sanitization process in order to remove incorrect or incomplete flows that may confuse or bias the training phase. The sanitization process removes those TCP flows that are not properly formed (e.g., without TCP establishment or termination, and flows with packet loss or with out-of-order packets) from the training set. However, no sanitization process is applied to UDP traffic.
If you are interested in any of these labeled traces send an email to:
This dataset is derived from the papers:
Valentín Carela-Español, Tomasz Bujlow, and Pere Barlet-Ros: "Is Our Ground-Truth for Traffic Classification Reliable?", In Proc. of the Passive and Active Measurements Conference (PAM'14), Los Angeles, CA, USA, March 2014. [pdf] [doi]
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros: "Comparison of Deep Packet Inspection (DPI) tools for traffic classification" , Technical Report, UPC-DAC-RR-CBA-2013-3, June 2013. [pdf]
The validation of the different proposals in the traffic classification literature is a controversial issue. Usually, these works base their results on a ground-truth built from private datasets and labeled by techniques of unknown reliability. This makes the validation and comparison with other solutions an extremely difficult task. This paper aims to be a first step towards addressing the validation and trustworthiness problem of network traffic classifiers. We perform a comparison between 6 well-known DPI-based techniques, which are frequently used in the literature for ground-truth generation. In order to evaluate these tools we have carefully built a labeled dataset of more than 500 000 flows, which contains traffic from popular applications. Our results present PACE, a commercial tool, as the most reliable solution for ground-truth generation. However, among the open-source tools available, NDPI and especially Libprotoident, also achieve very high precision, while other, more frequently used tools (e.g., L7-Filter ) are not reliable enough and should not be used for ground-truth generation in their current form.
The dataset used in the paper "Is our ground-truth for traffic classification relaible?" consists of 1 262 022 flows captured during 66 days, between February 25, 2013 and May 1, 2013, which account for 35.69 GB of pure packet data. The dataset has been artificially built in order to allow us its publication with full packet payload. However, we have manually simulated different human behaviours for each application studied in order to make it as representative as possible. The selected applications are shown below:
The dataset consists of three pcap traces, one for each OS used (LX: Linux, W7: Windows 7, XP: Windows XP), and three INFO files, one for each pcap trace. Each line in the INFO file corresponds to a flow in the pcap trace and is described as follows:
flow_id + "#" + start_time + "#" + end_time + "#" + local_ip + "#" + remote_ip + "#" + local_port + "#" + remote_port + "#" + transport_protocol + "#" + operating_system + "#" + process_name + "#" + HTTP Url + "#" + HTTP Referer + "#" + HTTP Content-type +"#" .
The process name was present for 520 993 flows (41.28 % of all the flows), which account for 32.33 GB (90.59 %) of the data volume. Additionally, 14 445 flows (1.14 % of all the flows), accounting for 0.28 GB (0.78 %) of data volume, could be identified based on the HTTP content-type field extracted from the packets. Therefore, we were able to successfully establish the ground truth for 535 438 flows (42.43 % of all the flows), accounting for 32.61 GB (91.37 %) of data volume. The remaining flows are unlabeled due to their short lifetime (below <1 s), which made VBS, our ground-truth generator, incapable to reliably establish the corresponding sockets. Only these successfully classified flows will be taken into account during the evaluation of the classifiers. However, all the flows are included in the publicly available traces. This ensures data integrity and the proper work of the classifiers, which may rely on coexistence of different flows. We isolated several application classes based on the information stored in the database (e.g., application labels, HTTP content-type field). The classes together with the number of flows and the data volume are shown in the next table:
Application | #Flows | #Megabytes |
Edonkey | 176 581 | 2 823.88 |
BitTorrent | 62 845 | 2 621.37 |
FTP | 876 | 3 089.06 |
DNS | 6 600 | 1.74 |
NTP | 27 786 | 4.03 |
RDP | 132 907 | 13 218.47 |
NETBIOS | 9 445 | 5.17 |
SSH | 26 219 | 91.80 |
Browser HTTP | 46 669 | 5 757.32 |
Browser RTMP | 427 | 5 907.15 |
Unclassified | 771 667 | 3 026.57 |
For a more detailed description of the dataset we refer the reader to our paper and technical report cited before.
To collect and accurately label the flows, we adapted Volunteer-Based System (VBS) developed at Aalborg University. The task of VBS is to collect information about Internet traffic flows (i.e., start time of the flow, number of packets contained by the flow, local and remote IP addresses, local and remote ports, transport layer protocol) together with detailed information about each packet (i.e., direction, size, TCP flags, and relative timestamp to the previous packet in the flow). For each flow, the system also collects the process name associated with that flow. The process name is obtained from the system sockets. This way, we can ensure the application associated to a particular traffic. Additionally, the system collects some information about the HTTP content type (e.g., text/html, video/x-flv ). The captured information is transmitted to the VBS server, which stores the data in a MySQL database. The source code was published under a GPL license. The modified version of the VBS client captures full Ethernet frames for each packet, extracts HTTP URL and Referer fields. We added a module called pcapBuilder, which is responsible for dumping the packets from the database to PCAP files. At the same time, INFO files are generated to provide detailed information about each flow, which allows us to assign each packet from the PCAP file to an individual flow.
If you are interested in any of these labeled traces send an email to:
This dataset is derived from the papers:
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros: "Independent Comparison of Popular DPI Tools for Traffic Classification" , Computer Networks 76 (2015), pp. 75-89. [pdf] [doi]
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros: "Extended Independent Comparison of Popular Deep Packet Inspection (DPI) Tools for Traffic Classification" , Technical Report, UPC-DAC-RR-CBA-2014-1, Jan 2014. [pdf]
Deep Packet Inspection (DPI) is the state-of-the-art technology for traffic classification. According to the conventional wisdom, DPI is the most accurate classification technique. Consequently, most popular products, either commercial or open-source, rely on some sort of DPI for traffic classification. However, the actual performance of DPI is still unclear to the research community, since the lack of public datasets prevent the comparison and reproducibility of their results. This paper presents a comprehensive comparison of 6 well-known DPI tools, which are commonly used in the traffic classification literature. Our study includes 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter, NDPI, and Libprotoident). We studied their performance in various scenarios (including packet and flow truncation) and at different classification levels (application protocol, application and web service). We carefully built a labeled dataset with more than 750 K flows, which contains traffic from popular applications. We used the Volunteer-Based System (VBS), developed at Aalborg University, to guarantee the correct labeling of the dataset. We released this dataset, including full packet payloads, to the research community. We believe this dataset could become a common benchmark for the comparison and validation of network traffic classifiers. Our results present PACE, a commercial tool, as the most accurate solution. Surprisingly, we find that some open-source tools, such as Libprotoident and NDPI, also achieve very high accuracy.
The dataset used in the paper "Independent Comparison of Popular DPI Tools for Traffic Classification?" consists 767 690 flows, which account for 53.31 GB of pure packet data. The application name was present for 759 720 flows (98.96 % of all the flows), which account for 51.93 GB (97.41 %) of the data volume. The remaining flows are unlabeled due to their short lifetime (usually below 1 s), which made VBS incapable to reliably establish the corresponding sockets. The dataset has been artificially built in order to allow us its publication with full packet payload. However, we have manually simulated different human behaviours for each application studied in order to make it as representative as possible.
The dataset consists of a pcap traces and an INFO file. Each line in the INFO file corresponds to a flow in the pcap trace and is described as follows:
flow_id + "#" + start_time + "#" + end_time + "#" + local_ip + "#" + remote_ip + "#" + local_port + "#" + remote_port + "#" + transport_protocol + "#" + operating_system + "#" + process_name + "#" + HTTP Url + "#" + HTTP Referer + "#" + HTTP Content-type +"#" .
Unlike our previous paper "Is our ground-truth for traffic classification reliable?", the classification in this paper has been done at three different levels. The first level studied is the Application Protocol level. Next table shows the content of the dataset at this level:
Application Protocol | #Flows | #Megabytes |
DNS | 18 251 | 7.66 |
HTTP | 43 127 | 7 325.44 |
ICMP | 205 | 2.34 |
IMAP-STARTTLS | 35 | 36.56 |
IMAP-TLS | 103 | 410.23 |
NETBIOS Name Service | 10 199 | 11.13 |
NETBIOS Session Service | 11 | 0.01 |
SAMBA Session Service | 42 808 | 450.39 |
NTP | 42 227 | 6.12 |
POP3-PLAIN | 26 | 189.25 |
POP3-TLS | 101 | 147.68 |
RTMP | 378 | 2 353.67 |
SMTP-PLAIN | 67 | 62.27 |
SMTP-TLS | 52 | 3.37 |
SOCKSv5 | 1 927 | 898.31 |
SHH | 38 961 | 844.87 |
Webdav | 57 | 59.91 |
The second level of classification studied is the Application level. Next table presents the distribution of the dataset based on its application:
Application | #Flows | #Megabytes |
4Shared | 144 | 13.39 |
America's Army | 350 | 61.15 |
BitTorrent clients (encrypted) | 96 399 | 3 313.98 |
BitTorrent clients (non-encrypted) | 261 527 | 6 779.95 |
Dropbox | 93 | 128.66 |
eDonkey clients (obfuscated) | 12 835 | 8 178.74 |
eDonkey clients (non-obfuscated) | 13 852 | 8 480.48 |
Freenet | 135 | 538.28 |
FTP clients (active) | 126 | 341.17 |
FTP clients (passive) | 122 | 270.46 |
iTunes | 235 | 75.4 |
League of Legends | 23 | 124.14 |
Pando Media Booster | 13 453 | 13.3 |
PPlive | 1 510 | 83.86 |
PPStream | 1 141 | 390.4 |
RDP Clients | 153 837 | 13 257.65 |
Skype (all) | 2 177 | 102.99 |
Skype (audio) | 7 | 4.85 |
Skype (file transfer) | 6 | 25.74 |
Skype (video) | 7 | 41.16 |
Sopcast | 424 | 109.34 |
Spotify | 178 | 195.15 |
Steam | 1 205 | 255.84 |
TOR | 185 | 47.14 |
World of Warcraft | 22 | 1.98 |
The last level studied is related to services at web traffic. The classes together with the number of flows and the data volume are shown in the next table:
Web Service | #Flows | #Megabytes |
4Shared | 98 | 68.42 |
Amazon | 602 | 51.02 |
Apple | 477 | 90.22 |
Ask | 171 | 1.86 |
Bing | 456 | 36.84 |
Blogspot | 235 | 10.53 |
CNN | 247 | 3.66 |
Craigslist | 179 | 4.09 |
Cyworld | 332 | 13.06 |
Doubleclick | 1 989 | 11.24 |
eBay | 281 | 8.31 |
6 953 | 747.35 | |
Go.com | 335 | 25.83 |
6 541 | 532.54 | |
9 | 0.22 | |
Justin.tv | 2 326 | 126.33 |
62 | 2.14 | |
Mediafire | 472 | 27.99 |
MSN | 928 | 23.22 |
MySpace | 2 | 2.54 |
189 | 3.64 | |
Putlocker | 103 | 71.92 |
QQ.com | 753 | 10.46 |
Taobao | 387 | 24.29 |
The Huffington Post | 71 | 21.19 |
Tumblr | 403 | 52.56 |
1 138 | 13.67 | |
Vimeo | 131 | 204.45 |
Vk.com | 343 | 9.59 |
Wikipedia | 6 092 | 521.95 |
Windows Live | 26 | 0.16 |
Wordpress | 169 | 33.31 |
Yahoo | 17 373 | 937.07 |
YouTube | 2 534 | 1 891.79 |
For a more detailed description of the dataset we refer the reader to our paper and technical report cited before.
To collect and accurately label the flows, we adapted Volunteer-Based System (VBS) developed at Aalborg University. The task of VBS is to collect information about Internet traffic flows (i.e., start time of the flow, number of packets contained by the flow, local and remote IP addresses, local and remote ports, transport layer protocol) together with detailed information about each packet (i.e., direction, size, TCP flags, and relative timestamp to the previous packet in the flow). For each flow, the system also collects the process name associated with that flow. The process name is obtained from the system sockets. This way, we can ensure the application associated to a particular traffic. Additionally, the system collects some information about the HTTP content type (e.g., text/html, video/x-flv ). The captured information is transmitted to the VBS server, which stores the data in a MySQL database. The source code was published under a GPL license. The modified version of the VBS client captures full Ethernet frames for each packet, extracts HTTP URL and Referer fields. We added a module called pcapBuilder, which is responsible for dumping the packets from the database to PCAP files. At the same time, INFO files are generated to provide detailed information about each flow, which allows us to assign each packet from the PCAP file to an individual flow.
If you are interested in this labeled trace send an email to:
The complete list of publications related to this group can be found here.