際際滷

際際滷Share a Scribd company logo
Walking on AI/ML for Networking
Walking on AI/ML for
Networking
Ph.D Oscar Mauricio Caicedo
Grupo de Ingenier鱈a Telem叩tica
Universidad del Cauca
Colombia
Outline
 Unsupervised Learning for Heavy-Hitters Threshold Estimation
 Supervised Learning for Multipath Routing in Software-Defined Data Center
Networks
 Reinforcement Learning for Dynamic and Intelligent Routing
 Research Direction
3
Unsupervised Learning for Heavy-Hitters Threshold
Estimation
4
Unsupervised Learning
Learning approach Training dataset Problems aimed
Unsupervised Unlabeled Clustering
Clustering
(Boutaba et al., 2018)
5
 Duration  time between the first and last packet of the flow
 Size  total number of bytes transmitted in the flow
 Rate  given by the size divided by the duration
 Burstiness  characterization of burst in a flow over time t
(Duque-Torres et al., 2019)
Heavy-Hitters Classification Schemes
6
 Outlier and anomaly detection
 Load balancing
 Alleviating network congestion
(Duque-Torres et al., 2019)
Need for Heavy-Hitter Flows Detection
 HH detections is far from being
obvious
 Threshold estimation is a major
challenge
7
(Duque-Torres et al., 2019)
Knowledge Discovery Process
Threshold Estimation based on Knowledge Discovery
Process
8
(Duque-Torres et al., 2019)
 Can the Knowledge Discovery Process (KDP) to shed new light
on threshold estimation in HH detection?
 Can we determine an optimal threshold or set of thresholds
that would optimally separate the flows into HHs and non-HHs
in a variety of network and traffic conditions?
Threshold Estimation based on Knowledge Discovery
Process
9
 Duration  tortoises and dragonflies
 Sizes  elephants and mice
 Rate  cheetahs and snails
 Burstiness  porcupines and stingrays
Understanding the Problem
10
(Duque-Torres et al., 2019)
Understanding the Problem
11
Work Citations Year Appeared In Threshold Value
Curtis et al. 187 2011 IEEE INFOCOM Size
128KB, 1MB and
100MB
Chiesa et al. 29 2016 IEEE/ACM Trans. Netw. Bandwidth 10%
Moshref et al. 47 2013 ACM HotSDN Bandwidth 10%
Liu et al. 19 2013 IEEE Commun. Lett. Rate 1-10 Mbps
Trestain et al. 14 2013 IFIP/IEEE IM Rate Not specified
Lin et al. 9 2014 IEEE GLOBECOM Size 100 MB
Bi et al. 8 2013 IEEE GLOBECOM Size Adaptive
Understanding the Problem
(Duque-Torres et al., 2019)
12
TCP
(Duque-Torres et al., 2019)
Understanding the Data
Dataset Count Average
UNIV1
UNIV2
CAIDA2016
CAIDA2018
6886055
6092
9017007
8225731
710.74
198.84
517.31
1021.20
UDP
Count Average
1453303
9869 978
920417
1618524
390.74
758.24
433.62
674.91
TOTAL
Count Average
9999617
9 99956
9994197
9982640
582.60
749.34
509.13
959
Datasets
13
Feature name Feature description Direction
min_ps Packet with minimum size in the flow F & B
max_ps Packet with maximum size in the flow F & B
avg_ps Average packet size F & B
std_ps Standard deviation of packet size F & B
min_piat Minimum packet inter-arrival time F & B
max_piat Maximum packet inter-arrival time F & B
avg_piat Average packet inter-arrival time F & B
std_piat Stand. dev. of packet inter-arrival time F & B
octTotCount Number of transmitted bytes in the flow F & B
pktTotCount Number of transmitted pkts in the flow F & B
flowDur Duration of the flow (in seconds)
Data Preparation
(Duque-Torres et al., 2019)
14
 K-Means was used due to its simplicity, speed, and accuracy
 The optimal number of clusters was defined by Silhouette method
Data Mining
Value Interpretation
Si = 1 The samples are very well clustered
0.5 < Si < 0.7 The samples have a reasonable structure for clustering
Si = 0.5 The samples are probably placed in the wrong cluster
0.26 < Si < 0.5 The samples do not have a substantial structure for clustering
Si = 0 The samples lies between two cluster
Si < 0 The samples are placed in the wrong cluster
(Duque-Torres et al., 2019)
15
Data Mining
Silhouette Analysis
(Duque-Torres et al., 2019)
16
Data Mining
 K-means along the
two first principal
components
 k = 2
(Duque-Torres et al., 2019)
Dataset Class I Class II Total
UNIV1
UNIV2
CAIDA2016
CAIDA2018
293 962
18 056
451 696
563 869
8
1
9
3
293 970
18 057
451 705
563 872
17
Data Mining
 K-means along the
two first principal
components
 k = 4
(Duque-Torres et al., 2019)
18
max min average
11 394
18 180
33 043
14 258
1
2
1
1
16.09
103.81
16.78
12.4
281 168
28 4 648
28 465
44 251
149 047
108 230
20775
14 784
219 806
194 204
25 357
23 489
85 848
118 060
66 743
14 658
14 987
32 590
47 373
2 513
29 761
61 128
57 058
5 162
79 661
40 486
24 140
190 733
1 418
8 256
904
98 909
4.42
13.10
3.52
191.33
Flow Size [MB]
K Dataset flows
I
UNIV1
UNIV2
CAIDA2016
CAIDA2018
293 709
17 979
1451 051
563 472
II
UNIV1
UNIV2
CAIDA2016
CAIDA2018
6
6
7
21
III
UNIV1
UNIV2
CAIDA2016
CAIDA2018
26
32
2
376
IV
UNIV1
UNIV2
CAIDA2016
CAIDA2018
229
36
644
3
max min average
1.88
6.50
1.34
3.73
0.000032
0.000064
0.000028
0.000029
0.00847
0.00212
0.00421
0.00948
194.11
250.30
42.65
66.22
145.39
107.610
31.14
21.45
167.37
178.96
37.60
34.08
86.80
95.63
100.05
19.81
18.15
32.53
70.93
3.76
30.35
57.94
85.49
7.46
17.64
28.87
22.66
277.62
1.93
7.081
1.35
143.45
4.42
13.10
3.52
191.33
Number of Packets Flow Duration [s]
max min average
2 643.73
455.32
20.202
22.40
0
0
0
0
6.76
33.87
3.23
2.39
2 643.13
455.29
20.19
22.39
151.97
393.25
15.8
7.04
1 061.68
428.88
15.86
19.45
2 641.51
454.52
20.20
22.40
2.27
48.34
10.19
0.26
182.62
402.615
20.19
17.25
2 643.89
454.52
20.202
22.29
1.39
48.34
0.028
21.09
507.30
402.61
15.21
21.89
Evaluating the Discovered Knowledge
19
Silhouette Analysis - Ambiguous Flows
(Class I, Ambiguous Flows) (Duque-Torres et al., 2019)
Evaluating the Discovered Knowledge
20
(Duque-Torres et al., 2019)
Ambiguous Flows (Class I)
Evaluating the Discovered Knowledge
max min average
11 394
18 180
7 601
14 258
1
2
1
1
13.05
108.81
14.7
13.53
281 168
28 4 648
28 465
44 251
149 047
108 230
20775
14 784
1 275
17 376
834
1 356.28
Flow Size [MB]
K Dataset flows
I
UNIV1
UNIV2
CAIDA2016
CAIDA2018
293 002
17 979
449 904
562 273
II
UNIV1
UNIV2
CAIDA2016
CAIDA2018
707
36
1 147
1 199
max min average
0.456
6.5
0.324
0.922
0.000032
0.000064
0.000028
0.000029
0.00629
0.0214
0.00660
0.00558
145.39
28.87
1.34
3.734
1.88
107.610
31.14
21.45
0.306
178.96
37.60
34.08
Number of Packets Flow Duration [s]
max min average
2 643.73
455.32
20.20
22.40
0
0
0
0
6.09
33.87
3.29
2.36
2 643.05
453.71
20.20
22.403
0.379
16.06
0.05
0.030
0.379
339.58
15.65
15.52
For TCP traffic
 Flow size = 6 kB
 Packet count = 14 21
 There is no threshold that can
unambiguously classify HH flows
 Significance of network and traffic
characteristics
 Flow size = 6 kB
 Packet count = 14
(Duque-Torres et al., 2019)
Evaluating the Discovered Knowledge
22
 Open challenges
 The time required to collect enough flow details for
reliable classification requires further research
 Investigate the usefulness of per-flow packet size
distributions for accurate classification
 Develop a threshold-less approach for HH
classification
Evaluating the Discovered Knowledge
23
Supervised Learning for Multipath Routing in
Software-Defined Data Center Networks
24
Supervised Learning
Learning approach Training dataset Problems aimed
Supervised Labeled
Classification and regression
Semi-supervised Incomplete labels
Classification Regression
(Boutaba et al., 2018) 25
 Data Center Networks (DCNs)
 Significant bandwidth capacity
 Dynamic traffic
 Coexistence of many kind of flows
 Small, short-lived flows  mice
 Few large, long-lived flows  elephants
(Dixit et al., 2013) (Jain et al., 2009)
Multipath Routing in Software-Defined Data Center
Networks
26
 Equal-Cost MultiPath (ECMP)
 The most prevalent multipath routing mechanism in DCNs
 Each switch splits flows over the output ports applying a
function on packet headers
 E.g., a hash function on the 5-tuple header
(Dixit et al., 2013) (Jain et al., 2009)
Multipath Routing in Software-Defined Data Center
Networks
27
Input Port
Output 1
Output 2
ABCD
hash
AD
BC
Hot-spot
A + D << B + C
Flows
(Dixit et al., 2013) (Jain et al., 2009)
 ECMP causes hot-spots in DCNs  low throughput and high
delay
Multipath Routing in Software-Defined Data Center
Networks
28
 Software-Defined Networking (SDN) for facing ECMP
limitations
 A global view of the network
 A logically centralized controller to dynamically make and
install routing decisions
 A DCN using SDN  SDDCN
(Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010)
Multipath Routing in Software-Defined Data Center
Networks
29
 Reactive SDN-based multipath routing
 Dynamically rescheduling of elephant flows
 Mouse flows are handle by default routing (e.g., ECMP)
 Discriminate flows using static thresholds
 Hot-spots may occur before elephants are detected
(Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010)
Multipath Routing in Software-Defined Data Center
Networks
30
 Proactive SDN/ML-based multicast routing
 ML-based methods at the controller for proactively identifying
elephants
 Central collection
 Per-flow data
 Sampling-based data
(Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010)
Multipath Routing in Software-Defined Data Center
Networks
31
 Proactive SDN/ML-based multicast routing - controller side
 Central collection
 Per-flow data
 Heavy traffic overhead and poor scalability
 Sampling-based data
 Delayed and inaccurate flow information
(Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010)
Multipath Routing in Software-Defined Data Center
Networks
32
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
 Incremental learning at
server-side
 Proactive, accurately and timely
elephant detection
 Adaptation to varying traffic
conditions
 Low control overhead
 Constantly updated model with
limited resources
NELLY
33
(Estrada-Solano et al., 2019)
NELLY (Network Elephant Learner and anaLYzer)
Architecture - High level view
34
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
Analyzer
35
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
Learner
36
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
Packet traces UNI1 UNI2
Duration 65 min 158 min
Packets 19.85 M 100 M
IPv4 % of total traffic 98.98% (mostly TCP) 99.9% (mostly UDP)
IPv4 flows 1.02 M (TCP and UDP) 1.04 M (mostly UDP)
Datasets
37
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
 50 incremental learning algorithms
 Metrics
 True positive rate (TPR)
 False positive rate (FPR)
 Matthews Correlation Coefficient (MCC)
38
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
Algorithms
UNI1 UNI2
TPR (%) FPR (%) MCC TC
(袖s)
Header
type
TPR (%) FPR (%) MCC TC
(袖s)
Header
type
AHOT 85.97 35.52 0.327 4.07 BinNom 60.16 28.58 0.304 10.17 BinNom
ARF 82.39 28.82 0.359 12.01 BinNom 68.65 21.33 0.460 17.39 BinNom
Hoeffding tree 86.79 36.38 0.326 3.18 BinNom 57.92 28.46 0.284 4.64 BinNom
NB 74.76 35.69 0.254 4.76 BinNom 49.74 23.18 0.267 4.82 BinNom
OAUE 86.79 33.63 0.347 25.58 BinNom 63.28 28.65 0.332 33.06 BinNom
OzaBag 87.78 37.11 0.327 23.98 BinNom 64.17 31.13 0.314 36.61 BinNom
OzaBoost 75.88 29.93 0.307 11.56 BinNom 64.62 32.22 0.307 16.82 BinNom
SGD-SVM 16.76 10.21 0.067 0.81 Num 38.69 30.99 0.076 0.8 Num
In bold the top five results of TPR and MCC, and the TC
results shorter than 17.5 袖s, for both UNI1 and UNI2
Accuracy and classification time with various incremental learning algorithms
39
Accuracy varying the
elephants weight
NELLY (Network Elephant Learner and anaLYzer)
40
Comparison
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
Works
Learning
approach
Elephants
detection
False
elephants
Table
occupancy
Control
overhead
Detection
time
Network
modifications*
Poupart et al., 2016 Incremental Very high Very low Very high High Very short None
Tang et al., 2017 Batch High Very low Medium Medium Medium
Hardware of
switches
Chao et al., 2016
Batch and
incremental
Very high High Low Medium Medium None
Curtis et al., 2011 None Perfect None None Very low Long
Software in
servers
NELLY Incremental Very high Very low None Very low Very short
Software in
servers
* Assuming OpenFlow-based networks
41
 Open challenges
 NELLY implementation as an in-kernel software
 Evaluate impact cost in servers
 Processing and memory requirements
 Threshold definition
NELLY (Network Elephant Learner and anaLYzer)
(Estrada-Solano et al., 2019)
42
 Open challenges
 ML-based elephant rescheduler at the Controller
 Predict the rate and duration of elephants
 Regression or multi-classification?
 How to define the classes?
 Reschedule elephants based on prediction
 Algorithm for allocating elephants to links
NELLY (Network Elephant Learner and anaLYzer)
43
Reinforcement Learning for Dynamic and Intelligent
Routing
44
Reinforcement Learning
(Mammeri, 2019)
RL Concept
45
 Network traffic routing is fundamental in networking
 Selecting a path for packet transmission
 Selection criteria
 Cost minimization
 Maximization of link utilization
 Quality of Service (QoS) provisioning
Network Routing Problem
46
 The efficient selection of the best path through a network is still
an important issue to be addressed and improved
STATIC and DYNAMIC routing
 Dynamic routing
 Distance vector protocols
 Link state protocols
Network Routing Problem
47
 Distance vector protocols
 Routing information is not covered by the router itself
 Delay in routing information
 Network overhead
 Link state protocols
 More memory and processing power
 Understanding its operation is critical
Network Routing Problem
48
 Dynamic routing
 More complexity
 Overhead on the network
 Very rigid
 Doesnt learn from past decisions
Network Routing Problem
49
Network Routing Problem
Motivating scenario 50
Network Routing Problem
Motivating scenario 51
(Casas, 2019)
Routing based on Knowledge-Defined Networking
 Architecture based
on RL and SDN
52
Routing based on Knowledge-Defined Networking
(Casas, 2019)
Q-learning
Architecture Operation
53
Routing based on Knowledge-Defined Networking
 Open challenges
 The convergence time
 Testing in real large networks
 Multi-controllers
 Probing of statistics collection based on RL
 Collaborative learning for considering sub-networks?
Demo
54
Research Direction
55
5G Scaling Problem
 Mobile traffic is expected to grow exponentially
 The complexity, heterogeneity and scale of networks have grown
far beyond the limits of manual administration
 NFV is a 5G foundation
 Scaling VNFs pertaining to Network Services and Network Slices
is a big challenge
56
 Scaling in NFV brings cost reduction compared to oversize
capacity for workload peak
 Horizontal scaling dynamically increases/reduces the number of
VNF instances
 Vertical scaling increases/reduces the number of resources
allocated to one or more VNFs
 Most existing solutions scale NFs assuming they operate
isolatedly
5G Scaling Problem
57
Work Title Technique
Carella et al.
(2016)
An extensible autoscaling engine for software-based
network functions
Threshold rules
Dutta et al. (2016) QoE-Aware elasticity support in cloud-native 5G systems Threshold rules
Bilal et al. (2016) Dynamic cloud resource scheduling in virtualized 5G
mobile systems
Time series
Alawe et al. (2018) Improving traffic forecasting for 5G core network
scalability: A machine learning approach
Neural networks
Tang et al. (2015) Efficient auto-scaling approach in the telco cloud using
self-learning algorithm
Q-Learning
Alawe et al. (2018) On the scalability of 5G core network Control theory
5G Scaling Problem
Solutions focused on scaling NFs individually -(Demo)
58
Solutions that consider scaling NFs as part of compound network
services
Work Title Observation
Wang et al. (2016) Online VNF scaling in data centers It assumes that all NFs must
be scaled
Zhang et al. (2016) Co-scaler: cooperative scaling of software-defined
VNF service function chain
It focuses on flow migration
and scales all NFs
Rankothge et al.
(2017)
Optimizing resource allocation for virtualized
network functions in a cloud center using genetic
algorithms
It scales only NF that is
chosen randomly
Prados-Garzon et al.
(2018)
A queuing based dynamic auto scaling algorithm for
the LTE EPC control plane
It is specific for LTE vEPC
5G Scaling Problem
59
Collaborative Learning for 5G Scaling
60
Acknowledgements
Ph.D. Armando Ordo単ez
Ph.D. Student - Felipe Estrada
Ph.D. Student - Carlos Tobar
Master Student - Alejandra Duque
Master Student - Daniela Casas
Undergraduate Student - Juan Daza
Undergraduate Student - Andr辿s Maca
Questions?
Obrigado
Monitoring link
Commercia
l
office
Commercial
office
Data processing
center
Data link
Performance degradation
(e.g., delays, congestion, and loss)
Monitoring performance parameters of the
network as link usage, packet loss and delay
To assure the QoS
Accurate and timely statistics
Service provider
Up-to-date view of network
Probing Problem
Monitoring link
Commercia
l
office
Commercial
office
Data processing
center
Data link
Performance degradation
(e.g., delays, congestion, and loss)
 It is necessary the accurate and timely collection of the
statistics of flows
Service provider
Up-to-date view of network
Probing Problem
Probing based on Knowledge-Defined Networking
 Architecture based
on RL and KDN
65

More Related Content

Walking on AI/ML for Networking

  • 2. Walking on AI/ML for Networking Ph.D Oscar Mauricio Caicedo Grupo de Ingenier鱈a Telem叩tica Universidad del Cauca Colombia
  • 3. Outline Unsupervised Learning for Heavy-Hitters Threshold Estimation Supervised Learning for Multipath Routing in Software-Defined Data Center Networks Reinforcement Learning for Dynamic and Intelligent Routing Research Direction 3
  • 4. Unsupervised Learning for Heavy-Hitters Threshold Estimation 4
  • 5. Unsupervised Learning Learning approach Training dataset Problems aimed Unsupervised Unlabeled Clustering Clustering (Boutaba et al., 2018) 5
  • 6. Duration time between the first and last packet of the flow Size total number of bytes transmitted in the flow Rate given by the size divided by the duration Burstiness characterization of burst in a flow over time t (Duque-Torres et al., 2019) Heavy-Hitters Classification Schemes 6
  • 7. Outlier and anomaly detection Load balancing Alleviating network congestion (Duque-Torres et al., 2019) Need for Heavy-Hitter Flows Detection HH detections is far from being obvious Threshold estimation is a major challenge 7
  • 8. (Duque-Torres et al., 2019) Knowledge Discovery Process Threshold Estimation based on Knowledge Discovery Process 8
  • 9. (Duque-Torres et al., 2019) Can the Knowledge Discovery Process (KDP) to shed new light on threshold estimation in HH detection? Can we determine an optimal threshold or set of thresholds that would optimally separate the flows into HHs and non-HHs in a variety of network and traffic conditions? Threshold Estimation based on Knowledge Discovery Process 9
  • 10. Duration tortoises and dragonflies Sizes elephants and mice Rate cheetahs and snails Burstiness porcupines and stingrays Understanding the Problem 10
  • 11. (Duque-Torres et al., 2019) Understanding the Problem 11
  • 12. Work Citations Year Appeared In Threshold Value Curtis et al. 187 2011 IEEE INFOCOM Size 128KB, 1MB and 100MB Chiesa et al. 29 2016 IEEE/ACM Trans. Netw. Bandwidth 10% Moshref et al. 47 2013 ACM HotSDN Bandwidth 10% Liu et al. 19 2013 IEEE Commun. Lett. Rate 1-10 Mbps Trestain et al. 14 2013 IFIP/IEEE IM Rate Not specified Lin et al. 9 2014 IEEE GLOBECOM Size 100 MB Bi et al. 8 2013 IEEE GLOBECOM Size Adaptive Understanding the Problem (Duque-Torres et al., 2019) 12
  • 13. TCP (Duque-Torres et al., 2019) Understanding the Data Dataset Count Average UNIV1 UNIV2 CAIDA2016 CAIDA2018 6886055 6092 9017007 8225731 710.74 198.84 517.31 1021.20 UDP Count Average 1453303 9869 978 920417 1618524 390.74 758.24 433.62 674.91 TOTAL Count Average 9999617 9 99956 9994197 9982640 582.60 749.34 509.13 959 Datasets 13
  • 14. Feature name Feature description Direction min_ps Packet with minimum size in the flow F & B max_ps Packet with maximum size in the flow F & B avg_ps Average packet size F & B std_ps Standard deviation of packet size F & B min_piat Minimum packet inter-arrival time F & B max_piat Maximum packet inter-arrival time F & B avg_piat Average packet inter-arrival time F & B std_piat Stand. dev. of packet inter-arrival time F & B octTotCount Number of transmitted bytes in the flow F & B pktTotCount Number of transmitted pkts in the flow F & B flowDur Duration of the flow (in seconds) Data Preparation (Duque-Torres et al., 2019) 14
  • 15. K-Means was used due to its simplicity, speed, and accuracy The optimal number of clusters was defined by Silhouette method Data Mining Value Interpretation Si = 1 The samples are very well clustered 0.5 < Si < 0.7 The samples have a reasonable structure for clustering Si = 0.5 The samples are probably placed in the wrong cluster 0.26 < Si < 0.5 The samples do not have a substantial structure for clustering Si = 0 The samples lies between two cluster Si < 0 The samples are placed in the wrong cluster (Duque-Torres et al., 2019) 15
  • 17. Data Mining K-means along the two first principal components k = 2 (Duque-Torres et al., 2019) Dataset Class I Class II Total UNIV1 UNIV2 CAIDA2016 CAIDA2018 293 962 18 056 451 696 563 869 8 1 9 3 293 970 18 057 451 705 563 872 17
  • 18. Data Mining K-means along the two first principal components k = 4 (Duque-Torres et al., 2019) 18
  • 19. max min average 11 394 18 180 33 043 14 258 1 2 1 1 16.09 103.81 16.78 12.4 281 168 28 4 648 28 465 44 251 149 047 108 230 20775 14 784 219 806 194 204 25 357 23 489 85 848 118 060 66 743 14 658 14 987 32 590 47 373 2 513 29 761 61 128 57 058 5 162 79 661 40 486 24 140 190 733 1 418 8 256 904 98 909 4.42 13.10 3.52 191.33 Flow Size [MB] K Dataset flows I UNIV1 UNIV2 CAIDA2016 CAIDA2018 293 709 17 979 1451 051 563 472 II UNIV1 UNIV2 CAIDA2016 CAIDA2018 6 6 7 21 III UNIV1 UNIV2 CAIDA2016 CAIDA2018 26 32 2 376 IV UNIV1 UNIV2 CAIDA2016 CAIDA2018 229 36 644 3 max min average 1.88 6.50 1.34 3.73 0.000032 0.000064 0.000028 0.000029 0.00847 0.00212 0.00421 0.00948 194.11 250.30 42.65 66.22 145.39 107.610 31.14 21.45 167.37 178.96 37.60 34.08 86.80 95.63 100.05 19.81 18.15 32.53 70.93 3.76 30.35 57.94 85.49 7.46 17.64 28.87 22.66 277.62 1.93 7.081 1.35 143.45 4.42 13.10 3.52 191.33 Number of Packets Flow Duration [s] max min average 2 643.73 455.32 20.202 22.40 0 0 0 0 6.76 33.87 3.23 2.39 2 643.13 455.29 20.19 22.39 151.97 393.25 15.8 7.04 1 061.68 428.88 15.86 19.45 2 641.51 454.52 20.20 22.40 2.27 48.34 10.19 0.26 182.62 402.615 20.19 17.25 2 643.89 454.52 20.202 22.29 1.39 48.34 0.028 21.09 507.30 402.61 15.21 21.89 Evaluating the Discovered Knowledge 19
  • 20. Silhouette Analysis - Ambiguous Flows (Class I, Ambiguous Flows) (Duque-Torres et al., 2019) Evaluating the Discovered Knowledge 20
  • 21. (Duque-Torres et al., 2019) Ambiguous Flows (Class I) Evaluating the Discovered Knowledge max min average 11 394 18 180 7 601 14 258 1 2 1 1 13.05 108.81 14.7 13.53 281 168 28 4 648 28 465 44 251 149 047 108 230 20775 14 784 1 275 17 376 834 1 356.28 Flow Size [MB] K Dataset flows I UNIV1 UNIV2 CAIDA2016 CAIDA2018 293 002 17 979 449 904 562 273 II UNIV1 UNIV2 CAIDA2016 CAIDA2018 707 36 1 147 1 199 max min average 0.456 6.5 0.324 0.922 0.000032 0.000064 0.000028 0.000029 0.00629 0.0214 0.00660 0.00558 145.39 28.87 1.34 3.734 1.88 107.610 31.14 21.45 0.306 178.96 37.60 34.08 Number of Packets Flow Duration [s] max min average 2 643.73 455.32 20.20 22.40 0 0 0 0 6.09 33.87 3.29 2.36 2 643.05 453.71 20.20 22.403 0.379 16.06 0.05 0.030 0.379 339.58 15.65 15.52 For TCP traffic Flow size = 6 kB Packet count = 14 21
  • 22. There is no threshold that can unambiguously classify HH flows Significance of network and traffic characteristics Flow size = 6 kB Packet count = 14 (Duque-Torres et al., 2019) Evaluating the Discovered Knowledge 22
  • 23. Open challenges The time required to collect enough flow details for reliable classification requires further research Investigate the usefulness of per-flow packet size distributions for accurate classification Develop a threshold-less approach for HH classification Evaluating the Discovered Knowledge 23
  • 24. Supervised Learning for Multipath Routing in Software-Defined Data Center Networks 24
  • 25. Supervised Learning Learning approach Training dataset Problems aimed Supervised Labeled Classification and regression Semi-supervised Incomplete labels Classification Regression (Boutaba et al., 2018) 25
  • 26. Data Center Networks (DCNs) Significant bandwidth capacity Dynamic traffic Coexistence of many kind of flows Small, short-lived flows mice Few large, long-lived flows elephants (Dixit et al., 2013) (Jain et al., 2009) Multipath Routing in Software-Defined Data Center Networks 26
  • 27. Equal-Cost MultiPath (ECMP) The most prevalent multipath routing mechanism in DCNs Each switch splits flows over the output ports applying a function on packet headers E.g., a hash function on the 5-tuple header (Dixit et al., 2013) (Jain et al., 2009) Multipath Routing in Software-Defined Data Center Networks 27
  • 28. Input Port Output 1 Output 2 ABCD hash AD BC Hot-spot A + D << B + C Flows (Dixit et al., 2013) (Jain et al., 2009) ECMP causes hot-spots in DCNs low throughput and high delay Multipath Routing in Software-Defined Data Center Networks 28
  • 29. Software-Defined Networking (SDN) for facing ECMP limitations A global view of the network A logically centralized controller to dynamically make and install routing decisions A DCN using SDN SDDCN (Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010) Multipath Routing in Software-Defined Data Center Networks 29
  • 30. Reactive SDN-based multipath routing Dynamically rescheduling of elephant flows Mouse flows are handle by default routing (e.g., ECMP) Discriminate flows using static thresholds Hot-spots may occur before elephants are detected (Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010) Multipath Routing in Software-Defined Data Center Networks 30
  • 31. Proactive SDN/ML-based multicast routing ML-based methods at the controller for proactively identifying elephants Central collection Per-flow data Sampling-based data (Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010) Multipath Routing in Software-Defined Data Center Networks 31
  • 32. Proactive SDN/ML-based multicast routing - controller side Central collection Per-flow data Heavy traffic overhead and poor scalability Sampling-based data Delayed and inaccurate flow information (Boutaba et al., 2018)(Tang et al., 2017) (Chao et al., 2016) (Poupart et al., 2016) (Curtis et al., 2011a-b) (Al-Fares et al., 2010) Multipath Routing in Software-Defined Data Center Networks 32
  • 33. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Incremental learning at server-side Proactive, accurately and timely elephant detection Adaptation to varying traffic conditions Low control overhead Constantly updated model with limited resources NELLY 33
  • 34. (Estrada-Solano et al., 2019) NELLY (Network Elephant Learner and anaLYzer) Architecture - High level view 34
  • 35. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Analyzer 35
  • 36. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Learner 36
  • 37. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Packet traces UNI1 UNI2 Duration 65 min 158 min Packets 19.85 M 100 M IPv4 % of total traffic 98.98% (mostly TCP) 99.9% (mostly UDP) IPv4 flows 1.02 M (TCP and UDP) 1.04 M (mostly UDP) Datasets 37
  • 38. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) 50 incremental learning algorithms Metrics True positive rate (TPR) False positive rate (FPR) Matthews Correlation Coefficient (MCC) 38
  • 39. NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Algorithms UNI1 UNI2 TPR (%) FPR (%) MCC TC (袖s) Header type TPR (%) FPR (%) MCC TC (袖s) Header type AHOT 85.97 35.52 0.327 4.07 BinNom 60.16 28.58 0.304 10.17 BinNom ARF 82.39 28.82 0.359 12.01 BinNom 68.65 21.33 0.460 17.39 BinNom Hoeffding tree 86.79 36.38 0.326 3.18 BinNom 57.92 28.46 0.284 4.64 BinNom NB 74.76 35.69 0.254 4.76 BinNom 49.74 23.18 0.267 4.82 BinNom OAUE 86.79 33.63 0.347 25.58 BinNom 63.28 28.65 0.332 33.06 BinNom OzaBag 87.78 37.11 0.327 23.98 BinNom 64.17 31.13 0.314 36.61 BinNom OzaBoost 75.88 29.93 0.307 11.56 BinNom 64.62 32.22 0.307 16.82 BinNom SGD-SVM 16.76 10.21 0.067 0.81 Num 38.69 30.99 0.076 0.8 Num In bold the top five results of TPR and MCC, and the TC results shorter than 17.5 袖s, for both UNI1 and UNI2 Accuracy and classification time with various incremental learning algorithms 39
  • 40. Accuracy varying the elephants weight NELLY (Network Elephant Learner and anaLYzer) 40
  • 41. Comparison NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) Works Learning approach Elephants detection False elephants Table occupancy Control overhead Detection time Network modifications* Poupart et al., 2016 Incremental Very high Very low Very high High Very short None Tang et al., 2017 Batch High Very low Medium Medium Medium Hardware of switches Chao et al., 2016 Batch and incremental Very high High Low Medium Medium None Curtis et al., 2011 None Perfect None None Very low Long Software in servers NELLY Incremental Very high Very low None Very low Very short Software in servers * Assuming OpenFlow-based networks 41
  • 42. Open challenges NELLY implementation as an in-kernel software Evaluate impact cost in servers Processing and memory requirements Threshold definition NELLY (Network Elephant Learner and anaLYzer) (Estrada-Solano et al., 2019) 42
  • 43. Open challenges ML-based elephant rescheduler at the Controller Predict the rate and duration of elephants Regression or multi-classification? How to define the classes? Reschedule elephants based on prediction Algorithm for allocating elephants to links NELLY (Network Elephant Learner and anaLYzer) 43
  • 44. Reinforcement Learning for Dynamic and Intelligent Routing 44
  • 46. Network traffic routing is fundamental in networking Selecting a path for packet transmission Selection criteria Cost minimization Maximization of link utilization Quality of Service (QoS) provisioning Network Routing Problem 46
  • 47. The efficient selection of the best path through a network is still an important issue to be addressed and improved STATIC and DYNAMIC routing Dynamic routing Distance vector protocols Link state protocols Network Routing Problem 47
  • 48. Distance vector protocols Routing information is not covered by the router itself Delay in routing information Network overhead Link state protocols More memory and processing power Understanding its operation is critical Network Routing Problem 48
  • 49. Dynamic routing More complexity Overhead on the network Very rigid Doesnt learn from past decisions Network Routing Problem 49
  • 52. (Casas, 2019) Routing based on Knowledge-Defined Networking Architecture based on RL and SDN 52
  • 53. Routing based on Knowledge-Defined Networking (Casas, 2019) Q-learning Architecture Operation 53
  • 54. Routing based on Knowledge-Defined Networking Open challenges The convergence time Testing in real large networks Multi-controllers Probing of statistics collection based on RL Collaborative learning for considering sub-networks? Demo 54
  • 56. 5G Scaling Problem Mobile traffic is expected to grow exponentially The complexity, heterogeneity and scale of networks have grown far beyond the limits of manual administration NFV is a 5G foundation Scaling VNFs pertaining to Network Services and Network Slices is a big challenge 56
  • 57. Scaling in NFV brings cost reduction compared to oversize capacity for workload peak Horizontal scaling dynamically increases/reduces the number of VNF instances Vertical scaling increases/reduces the number of resources allocated to one or more VNFs Most existing solutions scale NFs assuming they operate isolatedly 5G Scaling Problem 57
  • 58. Work Title Technique Carella et al. (2016) An extensible autoscaling engine for software-based network functions Threshold rules Dutta et al. (2016) QoE-Aware elasticity support in cloud-native 5G systems Threshold rules Bilal et al. (2016) Dynamic cloud resource scheduling in virtualized 5G mobile systems Time series Alawe et al. (2018) Improving traffic forecasting for 5G core network scalability: A machine learning approach Neural networks Tang et al. (2015) Efficient auto-scaling approach in the telco cloud using self-learning algorithm Q-Learning Alawe et al. (2018) On the scalability of 5G core network Control theory 5G Scaling Problem Solutions focused on scaling NFs individually -(Demo) 58
  • 59. Solutions that consider scaling NFs as part of compound network services Work Title Observation Wang et al. (2016) Online VNF scaling in data centers It assumes that all NFs must be scaled Zhang et al. (2016) Co-scaler: cooperative scaling of software-defined VNF service function chain It focuses on flow migration and scales all NFs Rankothge et al. (2017) Optimizing resource allocation for virtualized network functions in a cloud center using genetic algorithms It scales only NF that is chosen randomly Prados-Garzon et al. (2018) A queuing based dynamic auto scaling algorithm for the LTE EPC control plane It is specific for LTE vEPC 5G Scaling Problem 59
  • 60. Collaborative Learning for 5G Scaling 60
  • 61. Acknowledgements Ph.D. Armando Ordo単ez Ph.D. Student - Felipe Estrada Ph.D. Student - Carlos Tobar Master Student - Alejandra Duque Master Student - Daniela Casas Undergraduate Student - Juan Daza Undergraduate Student - Andr辿s Maca
  • 63. Monitoring link Commercia l office Commercial office Data processing center Data link Performance degradation (e.g., delays, congestion, and loss) Monitoring performance parameters of the network as link usage, packet loss and delay To assure the QoS Accurate and timely statistics Service provider Up-to-date view of network Probing Problem
  • 64. Monitoring link Commercia l office Commercial office Data processing center Data link Performance degradation (e.g., delays, congestion, and loss) It is necessary the accurate and timely collection of the statistics of flows Service provider Up-to-date view of network Probing Problem
  • 65. Probing based on Knowledge-Defined Networking Architecture based on RL and KDN 65