Modeling Modern Network Traffic: From Data Representation to Automated Machine Learning – MINT
The Internet and the individual networks that compose it serve a critical role
in today's economy and society. In order to successfully maintain and secure
these networks, operators need to monitor their behavior as well as investigate
new problems. However, recent network and protocol advances pose fundamental
challenges to monitoring network traffic. Traffic is becoming ubiquitously
encrypted preventing direct access to quality of service indicators. Networks
have grown orders of magnitude faster precluding detailed logging and analyzing
individual packets or streams. The Internet itself has evolved to become more
centralized, preventing IP addresses from identifying services.
In this proposal, we address three interconnected research questions to regain
visibility into modern network traffic:
In Methods to Represent Network Traffic, we will study how to represent
traffic data in ways that are amenable to modeling, and that could optimize
models for both supervised and unsupervised modeling tasks. This study will
explore the impact of representations across four dimensions: (1) timeseries
representations; (2) representations across flows; (3) representations at higher
layers; and, (4) operations on compressed data.
In Methods to Select and Benchmark Models, we will build on our work on
data representation, to develop a set of tools to automatically explore model
and traffic representations tailored for network traffic problems. This methods
will enable the identification of the optimal operation points for a variety of
problems across network management. To support this goal, we will build a
large-scale repository of labeled flows across a number of different
applications and services as well as evaluate data representations that can be
used to build statistical learning models about network traffic.
In Methods to Operationalize Network Traffic Models, we will use the
software platforms and algorithmic primitives we built to design new techniques
and tools for operators to solve the challenges that blocks them from
transferring developed models from isolated laboratory experiments to real-world
deployments. We will support their need to monitor their networks and
investigate problems in real time by: (1) extending automated model selection to
account for systems costs and real world limitations; (2) addressing the need to
be able to determine when models become inaccurate and distinguishing model
inaccuracy from problems that are inherent in the network; (3) improve models
robustness by investigating a generalized approach for model transfer.
Project coordination
Francesco Bronzino (LABORATOIRE D'INFORMATIQUE DU PARALLELISME)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LISTIC LABORATOIRE D'INFORMATIQUE, SYSTÈMES, TRAITEMENT DE L'INFORMATION ET DE LA CONNAISSANCE
The University of Chicago
Stanford University
LIP LABORATOIRE D'INFORMATIQUE DU PARALLELISME
Help of the ANR 187,749 euros
Beginning and duration of the scientific project:
- 36 Months