cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1045
Views
10
Helpful
21
Replies

3750X 3 stack having issues with dropped packets

wsalomon75
Level 1
Level 1

Hello all,

I have a 3 stack of Catalyst 3750Xs that are having issues communicating with a Catalyst 9300. The site is complaining of slowness issues, and after running diagnostics (ping drops 0 packets from 9300 to Core but the 3750X > 9300 drops a lot), I have determined it has to be an issue with the link between the 3750X stack and the 9300. I have done the following:

1. Swapped out SFP and cable on both switches

2. Swapped the port on both switches

3. Changed the global MTU on the 3750X to 1510 and restarted the stack.

Here's the output for a ping to the 9300:

X#ping 10.20.22.250 re 2000 si 1504
Type escape sequence to abort.
Sending 2000, 1504-byte ICMP Echos to 10.20.22.250, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!.!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!.
Success rate is 98 percent (1975/2000), round-trip min/avg/max = 1/6/67 ms

I did 1504 packet size because from what I understand, with encapsulation the packet will be 1504 with the extra 4 coming from the encapsulated frame. I don't know how else to troubleshoot this, except the stack is pretty full and maybe it's just too much traffic coming through a 1G uplink.

Any thoughts?

21 Replies 21

"Can you explain what I'm looking at in this output . . ."

Sure, it tell us two major things.

First it shows what the ToS/DSCP (packets) and CoS (frame) marking counts are for both port ingress and egress.

Second it shows how many packets/frames were enqueued or dropped within each of the 4 egress queues broken down by groups of packets/frames targeted for a certain drop level treatment in each queue.

First thing to understand with 3750s, again, with QoS enabled, they support 4 egress queues, per port, but sometimes they are referenced as queues 0 to 3 and sometimes as queues 1 to 4 (so, you need to understand which numbering system is being used to insure you're dealing with the same actual queue).

Second, understanding classification into egress queue drop groups, at first (and possible a bit longer), can be confusing to understand.

For example, starting with a ToS/DSCP value of zero/BE, we can "map" such marked packets into our choice of one of the four egress queues (again either numbered 0..3 or 1..4) (oh, and again, remember there are both device defaults, and AutoQoS defaults - but you can override either).

Second, also again, with each egress queue, you "map" such marked packets into a particular drop group, which are distinguished by "threshold1, threshold2 or threshold3".  Incidentally threshold1 and threshold2 cannot be larger than threshold3.  I.e. thresholds 1 and 2, I recall, need to be <= threshold3, but don't recall whether threshold1 must be <= than threshold2.  Usually device defaults, AutoQoS defaults, and Cisco recommendations, have threshold 1 less than 2 which is also less than 3.

So, when a packet marked with ToS/DSCP zero/BE egresses a port, it's counted in the DSCP outgoing bucket for DSCP zero.

If there was port congestion, and the forgoing packet had to be QoS enqueued, it's also counted in the enqueued stats, under the queue and corresponding threshold it was "mapped" to.

If the mapped queue/threshold was full, then the packet/frame is dropped, and that too is counted under the corresponding "mapped" queue/threshold.

The stat, often, of most interest is for drops, and what queue/threshold, in particular, is dropping packets.  (BTW, unless only one specific DSCP or CoS is mapped to a specific queue/threshold, we cannot determine which specifically marked packets were dropped.)

However, enqueued stats do reveal congestion, but they don't directly indicate how "bad" the congestion was.  (Like does the queue usually contain only a couple of packets/frames or hundreds of packets/frames.)

Also BTW, an interface dropping pings requests/replies (remember, problem can be in either or both directions) will show as missing unsuccessful pings, but an "extensively" delayed ping request/reply, a time-out, I believe, also shows as a unsuccessful ping.

I think it unlikely a packet/frame would be enqueued long enough to trip a "drop" but then again, in your OP we see:

"Success rate is 98 percent (1975/2000), round-trip min/avg/max = 1/6/67 ms"

Your overall average is 6 ms, but minimal time were 1 ms.  Further, your maximum (not yet considered a time-out) was 67 ms!

A couple of things you might try, to see if there's a major difference in your ping stats:

Try ping using a max size (i.e. 64k) packet.  This will often increase the chances of a "lost" or delayed ping.

And/or ping using a different DSCP value, for example, using minimal sized packets marked with DSCP EF (decimal value 46, or ToS decimal value 184).

Again, unlikely a queued packet would be queued long enough, without being dropped, to time-out, but in somewhere along the path, Ethernet basic flow-control is active, that might push the delay boundary.  It can also cause sporadic "slowness".  Generally, unless you're using DCB kind of Ethernet flow control - it should be disabled.

Show interface x/x

Please share output 

GigabitEthernet3/1/2 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is d48c.b597.6ab2 (bia d48c.b597.6ab2)
Description: UPLINK > 9300
MTU 1510 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not set
Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseLX SFP
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:43, output 00:00:38, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 40
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 889000 bits/sec, 250 packets/sec
5 minute output rate 248000 bits/sec, 50 packets/sec
40795750 packets input, 28781132061 bytes, 0 no buffer
Received 15455837 broadcasts (9846156 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 9846156 multicast, 0 pause input
0 input packets with dribble condition detected
12516448 packets output, 3309542480 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Output drop is 40 this little so this not issue here 

Do

Show interface | include output drop

Let see if other port have drop not port server connect to. 

"Why I think like this' see the input rate it almost 1000 mbps"

Where did you see that?  What I saw was:

reliability 255/255, txload 1/255, rxload 1/255
.
.
.
5 minute input rate 889000 bits/sec, 250 packets/sec
5 minute output rate 248000 bits/sec, 50 packets/sec

I.e. barely a 1 Mbps ingress rate.

wsalomon75
Level 1
Level 1

So the issue with the phones ended up being in the ACL on our new core. Just wanted to pop in and reply to that in the event somebody else googles this.

Thank you for that update!

Goes a long way explaining why the 3750 stack's stats didn't show an obvious problem there.

Review Cisco Networking for a $25 gift card