cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1151
Views
0
Helpful
0
Comments
Karthik Kumar Thatikonda
Cisco Employee
Cisco Employee

Introduction

Application-Aware Routing tracks in real-time the path quality of the data plane tunnels between SD-WAN edge routers. Path performance metrics or qualitative metrics such as loss, latency, and jitter are measured using BFD. If there are soft failures on the WAN circuit the tunnel degradation detection and switchover takes several minutes. Moreover, the default convergence time for the detection of slowly degrading WAN circuits is very high. Reducing the defaults can result in false positives of performance metrics and traffic instability due to insufficient BFD samples collected. The solution is to improve the accuracy of measurements at a faster rate.

KarthikKumarThatikonda_0-1695677697506.png

With Enhanced Application-Aware Routing (EAAR), path performance metrics such as loss, latency, and jitter will be measured using actual or inline data packets thereby giving us faster tunnel degradation detection and switchover time. The default convergence time for soft failures is now in the order of seconds versus in the past, it used to be several minutes. Also, we are adding SLA dampening to the tunnels by default to account for WAN disruptions and instabilities by dampening the tunnel for a smoother transition into SLA forwarding.

What is Inline Data?

KarthikKumarThatikonda_1-1695677857117.png

Inline data is the actual data packet on the wire which is the user or application traffic and it is also part of the SD-WAN header.

Enhanced Application-Aware Routing

It has three salient features:

 

KarthikKumarThatikonda_2-1695678458474.png

Accurate Metric Measurements:

  • Measurement accuracy using inline data allows for more accurate and detailed measurements of loss, latency, and jitter metrics.

Quick SLA Enforcement:

  • SLA Enforcement with reduced poll-interval by being able to switchover the traffic to a better path or tunnel that meets the SLA in as little as 10 seconds.

SLA Dampening:

  • Being able to monitor the stability of WAN circuit flaps and dampen the tunnel before adding back into SLA forwarding.

What is an Application Probe Class (App-Probe-Class (APC))?

APC configuration provides queue and DSCP mapping per SLA. For more information see, Application Probe Class. It is recommended to use app-probe-class along with SLA class for accurate per queue level measurements of loss, latency, and jitter on the MPLS circuits to provide differentiated treatment across the Service Provider (SP) transport networks. If APC is configured for an SLA class, only that specific queue metric is used for that SLA. If no APC is configured for the SLA class, the system accumulates metrics from all queues and applies them to all SLA classes. 

Inline Measurements

Loss measurement leverages per queue adaptive-qos metrics that provide us per queue path loss and the system will be able to differentiate loss on the SD-WAN edge device called local loss and loss on the SD-WAN network called WAN loss. The loss measured is uni-directional with inline data and leverages the IPSEC sequence numbers of each packet sent by the sender or the source SD-WAN edge and the receiver or remote SD-WAN edge reports the packets lost on the WAN to the sender or source SD-WAN edge using BFD TLVs. For GRE-based tunnels, we add metadata with sequence numbers to measure the loss.

Latency and Jitter measurement leverages standards-based algorithms. We have a patented method to insert metadata for per-queue measurements. The latency measured is Round Trip Time (RTT) and the jitter measured is uni-directional with inline data and leverages the BFD TLVs to report the jitter from the receiver or remote SD-WAN edge to the source or sender SD-WAN edge. Since we are adding metadata with timestamps, the system will select inline data packets once every 100 ms and select the sampled data packets that avoid fragmentation and take into consideration the 12-byte metadata.

SLA Dampening

With the default 10-second poll-interval, the tunnel could be taken out from SLA forwarding quickly when there are soft failures on the WAN circuit. Same way, if the tunnel starts to meet SLA, it could be added back to SLA forwarding in as low as 10 seconds. Imagine a WAN circuit that is flapping, the tunnel could be switching back and forth to meeting SLA and not meeting SLA and hence can cause out-of-order packets, and affect the application experience. To address this the system will dampen the tunnel before adding it back into SLA forwarding. The WAN circuit stability would be monitored for configured time, and if there are no disruptions then the tunnel would be added back into SLA forwarding.

To understand this better, see the below figure. Let's say the tunnel was out of SLA, and at the time "t1" the tunnel meets SLA and the system starts the dampening timer. But, at time "t2" there was a flap on the WAN circuit, and the system stopped the dampening timer. Now later, at time "t3" the tunnel meets SLA again and the system again starts the dampening timer. If we are using Aggressive mode the system monitors the stability of the WAN circuit for 20 mins. And if there are no flaps on the WAN circuit then at time "t4" the tunnel now moves into SLA forwarding. This automatic timer adjustment and dampening capability prevents WAN circuit instabilities and poor application experience.

KarthikKumarThatikonda_0-1695683695883.png

EAAR Configuration Modes

SD-WAN Manager GUI provides 3 intent-based default options.

Aggressive:

  • This mode is used by customers who require faster convergence times during soft failures on the WAN i.e. anywhere between 10 secs to 60 secs

Moderate:

  • This mode is used by customers who require convergence times to be anywhere between 60 secs to 300 secs

Conservative:

  • This mode is used by customers who require slower convergence times between 5 mins to 30 mins

KarthikKumarThatikonda_0-1695685649154.png

Depending on the application network requirements, the customers could choose one of the above  3 intent-based options in SD-WAN Manager. If for any reason, the SLA dampening window times are not meeting their needs the values can be changed using the CLI Add-on template or CLI parcel. However, the best practice for the customers is to NOT change the default minimum poll interval value to be less than 10 secs.

EAAR Migration

Let's look at the migration scenarios. By default this capability is disabled. EAAR is a dual-ended feature. Both the local and remote SD-WAN edges must be running this feature. If one of the SD-WAN edges is not enabled with this feature, the Cisco Catalyst SD-WAN will fall back to using BFD-based measurements for loss, latency, and jitter to support existing deployments.

Resources

For more information see Cisco Catalyst SD-WAN Policies Configuration Guide.

For EAAR videos visit Cisco Catalyst SD-WAN and Cloud Networking Youtube channel. Please, subscribe to this channel for more videos on Cisco Catalyst SD-WAN!

 

 

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: