Re: SDA Subnetting Best practices for Voip Networks

Nuno Melo · ‎10-14-2023

According to the SDA design guide:

Reduce subnets and simplify DHCP management—In the overlay, IP subnets can be stretched across the fabric without flooding issues that can happen on large Layer 2 networks. Use fewer subnets and DHCP scopes for simpler IP addressing and DHCP scope management. Subnets are sized according to the services that they support, versus being constrained by the location of a gateway. Enabling the optional broadcast flooding (Layer 2 flooding) feature can limit the subnet size based on the additional bandwidth and endpoint processing requirements for the traffic mix within a specific deployment.

From this i would conclude that when designing a subnet size unlike in the past where we would create small subnets to mitigate big broadcast domains witch would increase arp broadcast traffic and potentially slow down arp resolution. That is no longer a concern due to how sda networks handle arp traffic with the only exception i find being L2 flooding

however on the other hand collaboration documentation states that subnets should be segregated as much as possible do to the arp cache table limitation of devices/server. So i.E defining a /16 subnet for all voice devices would be a bad idea due to the potential host cache size of devices in a classical lan design, which i don't see addressed on sda documentation, Taking in account voice traffic is sensitive to delays and therefore slow arp resolutions would impact the traffic negatively, for my perspective big /16 subnet for all voice traffic is not a good idea, although might not be a problem to data networks.

Any thoughts on this?

andy!doesnt!like!uucp · ‎10-15-2023

Hi Nuno
as i understand your query u r looking for the answer what is the best practice for the SDA between big & small subnets.
When looking for answer u want to take in account ARP flooding across SDA site topic which is basically defined by the combination of configurations of L2-flooding for the subnet in SDA Site & multicast configuration in the underlay. By default your ARP flooding in arbitrary VLAN will be limited to the edge node receiving ARP-request for dst MAC. Otherwise u will flood ARP-broadcast via multicast in underlay (meaning u simply replace L2-switched flooding in single broadcast domain with multicasting it across SDA Site devices joined to group u defined for BUM :0) & u will likely do latter in L2VN use-cases only. here it's important to notice that edge node will join to the BUM MC-group & receive multicasted in underlay ARP-broadcasts even if it doesnt have endpoints participating in L2VN. This must give u an idea that ARP-flooding will no contribute to your decision about how granular u will be in IP-subnetting in your arbitrary SDA Site when it's about L3VN.

Nuno Melo · ‎10-17-2023

So if i understand correctly L2-Flooding will not impact a L3VN because broadcast arp traffic is either unicasted through the L3VN if L2-Flooding is not enabled or if L2-Flooding is enabled Multicasted on the Edge-nodes

So could you shed some light on what are the determining factors in SDA regarding ip subnet sizes? is not arp cache size on the host a factor?

andy!doesnt!like!uucp · ‎10-17-2023

So if i understand correctly L2-Flooding will not impact a L3VN
> by default for L3VNs all the broadcasts are local to switch of broadcast originator. u would like to stay with default settings for L3VN
because broadcast arp traffic is either unicasted through the L3VN if L2-Flooding is not enabled
>ARP wont be sent outside the ITR at all. i didnt lab it yet but i assume ITR will proxy-arp & it will VXLAN further unicast packets either to RLOC responded by CP-node or to PETR
or if L2-Flooding is enabled Multicasted on the Edge-nodes
>that's what you would limit to pure ethernet services (L2VNs - not good term as it even doesnt have VRF association in Fabric Site)
> & yeah it gets Multicasted in underlay with VXLAN encapsulation

So could you shed some light on what are the determining factors in SDA regarding ip subnet sizes? is not arp cache size on the host a factor?
>on the edge-nodes u wont have ARP-caches with size comparable to those u usually can see on the L3-termination legacy switches bc ARP is limited to edge node. but if communicating party of EID in the same subnet is located on different edge node u will have entry in LISP map-cache for particular instance ID. it kinda of trade-in... but in general i'd not find any reaason for the ARP-cache size in SDA to influence my decision about IP-subnetting approach

Joseph W. Doherty · ‎10-17-2023

"So i.E defining a /16 subnet for all voice devices would be a bad idea due to the potential host cache size of devices in a classical lan design . . ."

BTW, unlikely that would be a host issue, for a few reasons. First, caches, by design, are usually fixed sized, i.e. they don't usually dynamically grow. Second, unless recording all gratuitous ARPs, unlikely a device would need to know all the MACs for a /16 subnet. Third, caching performance isn't usually too much impacted by size of cache, except when limited by special hardware.

From reading your SDA guide attachment, it appears, the classical (large broadcast domain) issue of hosts having to deal with a huge amount of broadcasts, of no interest to the host, is the potential issue (when you need, for some reason, to allow "classical" broadcast flooding on a SDA subnet).

Nuno Melo · ‎10-20-2023

Regarding your statement:

First, caches, by design, are usually fixed sized, i.e. they don't usually dynamically grow

Agreed, however in a classical network scenario a bigger sub-net would mean that in the case of voip where peer-to-peer traffic is involved the arp resolution potentially happens more frequently that in smaller subnets, since in big ones the arp traverses end-to-end meaning that assuming the arp table is fixed in normal operation the device needs to be aware of all the other devices in his own broadcast domain that means it potentially needs to cache fills up quicker than in a small subnets since in this case anything outside of his network with be forward towards the default gateway. At least that would be my assumption and also my understanding , as per the Collaboration Reference Guide

https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/srnd/collab12/collab12/netstruc.html?bookSearch=true

So under this assumption and again just an assumption it would not make sence to define a /16 for voice traffic even if we consider an SDA fabric

as per collaboration reference guide

Joseph W. Doherty · ‎10-20-2023

Unclear why a VoIP host needs to be aware of every other host (i.e. within it's ARP table, concurrently).

Could you provide a specific reference, in your referenced document, explaining this need, especially with regard to SDA?

andy!doesnt!like!uucp · ‎10-20-2023

it sounds like u mainly care about ARP-cache on the IP-phones, right?
what is the need for IP-phone to have more then several ARP-entries for correspondents adjacent to the same switch (assuming the IP-hone is widely calling around organization) & ARP-entry for default GW which is operating as ARP-proxy or default GW in all other cases?

Nuno Melo · ‎10-20-2023

From the link:

The recommendation to limit the number of devices in a single Unified Communications VLAN to approximately 512 is not solely due to the need to control the amount of VLAN broadcast traffic. Installing Unified CM in a VLAN with an IP subnet containing more than 1024 devices can cause the Unified CM server ARP cache to fill up quickly, which can seriously affect communications between the Unified CM server and other Unified Communications endpoints.

Unclear why a VoIP host needs to be aware of every other host (i.e. within it's ARP table, concurrently).

Arp will be send end-to-end during a voice conversation which has basically 2 stages , signaling stage where there is a TCP session happening between the device and the central callcontroll server and a media stage that happens end-to-end between devices, since in voice media traffic is peer-to-peer, once one device needs to send media traffic no another device in the same subnet an arp is generated to resolve the addresses. Since all devices are in the same subnet these arps become more frequent the more calls happen, the arp cache table fills up quickly as the subnet is larger, once it fills more arps are generated.

it sounds like u mainly care about ARP-cache on the IP-phones, right?

yes

what is the need for IP-phone to have more then several ARP-entries for correspondents adjacent to the same switch

Assuming a larger subnet the arp cache would be larger since the reason for such a subnet is to host more devices in the same network, therfor these hosts due to the fact the the traffic is peer-to-peer need to have a arp entry per each other device they communicate.generating more arp cache entrys and therfor filling the table more quickly,

Joseph W. Doherty · ‎10-20-2023

@Nuno Melo wrote:

From the link:

The recommendation to limit the number of devices in a single Unified Communications VLAN to approximately 512 is not solely due to the need to control the amount of VLAN broadcast traffic. Installing Unified CM in a VLAN with an IP subnet containing more than 1024 devices can cause the Unified CM server ARP cache to fill up quickly, which can seriously affect communications between the Unified CM server and other Unified Communications endpoints.

Thank you.

No doubt it's true an ARP cache can fill up. But even if true on SDN, they are noting a problem for a specific host, the Unified CM Server (which makes sense that such a server would carry carry more (much, much more) ARP entries than a typical client end-point.

In a "classical" network, we do want to minimize broadcasts, like ARP. But, in a SDN environment, since they suggest even using /16s, SDN must somehow mitigate the impact of all the additional ARP broadcasts that would be expected on such a large subnet. So, it's possible a host like an Unified CM server would generate more ARP requests, won't impact the SDN network, as a whole. It could, though, increase the workload of the CM server itself, but possibly not to any major degree (remember it's still caching recent entries).

That said, even is using SDN, using large subnets like /16s isn't required, just possible. Choice is yours.

BTW, I'm not really much familiar with Cisco's SDN, but when setting up WLANs, you can also size those larger than what you would size a wired VLAN for, because they too operate a bit differently. (We often used /20s for our WLANs, without issue.)

andy!doesnt!like!uucp · ‎10-20-2023

whatever u plan your subnets in SDA site to be, ARP-cache of the edge-node attached IP-phone wont ever have more then # of ports of adjacent switch + default GW

P.S. unless u have 2+ devices in the same subnet behind each access port of adjacent switch & each of them either calling or being called by subject IP-phone

willwetherman · ‎10-21-2023

I have customers running IP address pools with large quantities of Cisco IP phones without any issues.

I know that Cisco provided some recommendations about subnet sizing for IP phones in a traditional netwok, however this was primarily to prevent issues due to the CUCM server ARP cache limit of 1024 entries, and also as a general recommendation to limit the size of the broadcast/fault domain. If the CUCM server is not in the same subnet as your IP phones, and if you dont need L2 Flooding, then these recommendations are irrelevant in my opinion.

Also by default - when an IP address pool is provisioned in the fabric, an L2VNI is created for the associated VLAN which is used for communication between hosts within the same subnet. You will something like the following on all fabric edge nodes for a given IP address pool/VLAN.

instance-id 8197
remote-rloc-probe on-route-change
service ethernet
eid-table vlan 1021

Hosts directly resolve their own ARP bindings using L2VNI forwarding logic (the fabric edge nodes convert ARP broadcasts to directed unicasts between fabric egde nodes - known as ARP suppression). The presence of the L2VNI also enables the ability to selectivly enable L2 Flooding for the IP address pool. This disables ARP supression so ARP broadcasts (as well as other broadcast and link-local multicast traffic) are flooded directly between fabric edge nodes using underlay multicast.

Based on the default forwarding behaviour, IP phones will resolve ARP for other IP phones that they communicate with within the same subnet (using the destination IP phone's MAC address). So you are right, the ARP cache on the IP phones will increase with the more phones that they establish a peer-to-peer call with, however, this does not seem to be an issue in practise as A) an IP phone will only ARP for an IP phone that it needs to communicate with (not all IP phones in the subnet) and B) the ARP cache on the IP phones themselves tend to be failry short so will flush entries more frequently. The only exception might be if you have VoIP gateways on the same subnet as the IP phones, as these will certianly have a larger quanity of ARP entries, however, ideally these would be placed outside of the fabric with the CUCM for example.

Its also worth noting that DNA Center 2.3.5.3 introduced a new feature call Intrasubnet Routing that further optimises forwarding in the fabric. Intrasubnet Routing doesnt create a LISP L2VNI for the IP address pool so host to host communication within the same subnet uses Layer 3 forwarding behaviour. From what I understand, this is similar to the 'Enhanced Forwarding' behaviour that was available in earlier releases of SD-Access that optimises forwarding. Local proxy ARP is enabled under the Anycast SVI of the IP address pool, so traffic between two hosts within the same subnet is routed at Layer 3 using the L3VNI. When I tested this, the MAC address of the fabric edge node anycast gateway replies to all ARP requests, and not with the MAC address of destination device. This didnt reduce the number of ARP entries on the hosts themselves though (they just all resolve to the same MAC address).

Nuno Melo · ‎10-22-2023

So you are right, the ARP cache on the IP phones will increase with the more phones that they establish a peer-to-peer call with, however, this does not seem to be an issue in practise as

A) an IP phone will only ARP for an IP phone that it needs to communicate with (not all IP phones in the subnet)

This is a bit of an understatement since if we assume all phones are under the same subnet this will happen quite often, but yes

it will not be a brodcast, creating an increased amount of arp traffic that wouth otherwise be lower had the subnet be design smaller

B) the ARP cache on the IP phones themselves tend to be failry short so will flush entries more frequently

I believe then hence generating more arp traffic

Focusing just on the devices, and assuming CUCM/GWs are outside the fabric more arps there is still the matter of arp convergence time, i haven't seen any document regarding the arp convergence times under a sda fabric however in a real life scenario i sometimes see a convergence of aprox 1sec wich is to high.

Its also worth noting that DNA Center 2.3.5.3 introduced a new feature call Intrasubnet Routing that further optimises forwarding in the fabric. Intrasubnet Routing doesnt create a LISP L2VNI for the IP address pool so host to host communication within the same subnet uses Layer 3 forwarding behaviour. From what I understand, this is similar to the 'Enhanced Forwarding' behaviour that was available in earlier releases of SD-Access that optimises forwarding. Local proxy ARP is enabled under the Anycast SVI of the IP address pool, so traffic between two hosts within the same subnet is routed at Layer 3 using the L3VNI. When I tested this, the MAC address of the fabric edge node anycast gateway replies to all ARP requests, and not with the MAC address of destination device. This didnt reduce the number of ARP entries on the hosts themselves though (they just all resolve to the same MAC address).

This sounds like a good alternative

andy!doesnt!like!uucp · ‎10-22-2023

"IP phones themselves tend to be failry short so will flush entries more frequently. The only exception might be if you have VoIP gateways on the same subnet as the IP phones, as these will certianly have a larger quanity of ARP entries, however, ideally these would be placed outside of the fabric with the CUCM for example."
no idea about CUCM, but i'm a bit sceptic of the proper design to place it in the same subnet with IP-phones. rather u will put it in the protected subnet somewhere in INFRA-VN. But for sure the case with VOIP-GW would be placing them in separate subnet (again somewhere in GRT/INFRA_VN) with maximum of several gateways & default GW es entry/exit point.