cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
981
Views
0
Helpful
3
Replies

sporadic failed 802.1X authentications on Win10 PCs

NetAdminCisco
Level 1
Level 1

Hello Community,

I took charge of the local Cisco Infrastructure network administration for 2 years already since I finished my apprenticeship.

I received issues from users that their PCs sometimes have problems with network connection. Our enterprise functions on NAC based solution. We work with ISE and C9300/2960S/2960X switches each equipped with different IOS/Xe versions.

I took the matter seriously and togheter with our ServiceDesk, we performed some research on the supplicant side.

Following has been examined:

  1. Drivers for the Ethernet and Realtek adapters - we constantly updated the DELL drivers to the newest versions
  2. Certificate of the Supplicant - we have a PKI server and certificates installed on the clients  are sometimes updated via gpupdate /force
  3. Other issues like - Bitlocker, Inter vPro or Docking station

We have conducted research on 11 random clients and each was a different model. Further we couldn't get a fix of the problem by just updating the network adapters drivers. For like a range of 4 to 8 Weeks the authentication worked and then the issues re-appeared. 

We got failed 802.1X authentications:

  1. On every switch model randomly
  2. It is always a different DELL PC model that gets a failed authentication

So we couldn't say in the end that the clients have problems. We also got other clients with older drivers and they never had any issue with the 802.1X authentication than those with newer drivers. I said then I will debug the authentication process and the EAPOL packets. We use EAP-TLS Type 13 authentication.

I debuged a successful authentication and a failed one. I used the commands

  • Debug radius
  • Debug eap packets
  • Debug dot1x all

On Debug dot1x all logs I saw that when an EAPOL packet was sent to the supplicant, the client didn't send any dot1x keys data and on other occasions I saw in the authentcation database that there were some clients with failed dot1x authentication. I restared the port and then only the MAC of the Alcatel VoIP Phone got authenticated but the MAC of the client was not listed anymore. Like it got cached on the interface during the live authentication session. 

Some of times the users laptop was in the rejection group on ISE after 3 failed attempt to authenticate and after 1h it retried to authenticate and it succeeded on the first try. We had cases with MAC Cacheing through the VoIP Phone but the R100 from Alcatel solved the issue 90% of the cases.

My last assumption would be that the configured timeout and automated re-authentification syntax on the interface is causing the issue. 

This is our NAC configuration

switchport mode access
switchport voice vlan 546
no logging event link-status
authentication host-mode multi-domain
authentication order dot1x mab
authentication priority dot1x mab
authentication port-control auto
authentication periodic
authentication timer reauthenticate server
no snmp trap link-status
mab
dot1x pae authenticator
dot1x timeout quiet-period 300
dot1x timeout tx-period 3
dot1x timeout ratelimit-period 300
spanning-tree portfast
spanning-tree bpduguard enable

What is our opinion? Where should I look deeper?

 

3 Replies 3

Arne Bier
VIP
VIP

Sounds like a very frustrating issue, especially when it's sporadic. 

Can you list the error messages as seen in ISE when these Workstations fail 802.1X ? - e.g. click on the details in Live Logs and dig out the error condition. That is the starting point to see what the cause might be.

Are you doing dynamic VLAN assignment (i.e. does ISE return a VLAN Name/ID on successful authentication) ? I don't see a switchport access vlan XX in your IOS config - default is VLAN 1, but best practice is to avoid VLAN 1. But that should not have any impact on your issue.

If these workstation do work, but fail occasionally, then we should rule out the Windows client certificate at least. But most of the time it's an endpoint issue, not a switch or ISE. There are more variables on the Windows PC that can cause issues.

Have you looked at the Windows Event Viewer for clues?  The Events are hidden under

Applications and Services Logs > Microsoft > Windows > Wired-AutoConfig > Operational

Also, do you know what state these workstations were in when they failed?  Were they in sleep mode, and then woken up, and then failed? Or did the user restart the workstation and then it failed immediately?

 

 

NetAdminCisco
Level 1
Level 1

Exactly. Our RADIUS server sends a return value and the VLAN is dynamicly assigned to the interface.

I tried this week to test the state of a laptop. We are having many different and individual issues.

  1. Expired certificate
  2. Missing AD Group for a client
  3. Too old drivers

But what I saw is, that many laptops who got affected don't have any of the common issues listed in my post. I tested your idea about the state of the client and saw that many users put their laptops in sleep mode. The TX traffic was slowly droppng till it hit under 10 bits. But...and now comes the interesting part. When it is in sleep mode. The MAC address remains autenticated via dot1x till next reauthentication till the next re-authentication session.

I disabled and re-enabled the interface to forcefully start a new link up and then of course the communication dropped from Layer 3 to Layer 1. But ISE still tried to authenticate the MAC Address and voilla...the MAC Address was in the reject group.

Cause we got a user, and every morning when he arrives in the office, he sees his laptop unable to get a network connection via wired LAN. After 1h it works again. Our clients go for 1h in to the reject group everytime they failed to authenticate. I saw after the re-enaling of the interface that even though the communication is only Layer1. And there is no active authentication session, ISE still tries to authenticate the MAC Address and then no wonder why the MAC failed at getting authenticated and it landed in the reject group. 

The laptop is a DELL and was connected to a DELL docking station. When the Ethernet cable is connected to the laptop. The issue doesn't occur. It occured only when vPro was an issue, but it has been deactivated on most clients now. When the Laptop is connected via USB-C to the dockingstation, then the dockingstation to the switch interface and the laptop is in sleep mode. There is no authentication on the interface but the MAC is still visible on ISE and it fails to authenticate. Like the dock is still trying to communicate with the interface...even though the laptop is in sleep mode.

Arne Bier
VIP
VIP

I have been trying to get my head around the various "Power Saving" modes of Windows 10/11 and it's quite a complex thing these days. The CPU "Sleep" states are described as S0 through S4, where S0 means the CPU is awake and running, and S4 means it's powered off. There is a type of "sleep" in Windows these days that is technically S0 - which means the PC is still able to communicate with the network layer - it listens for its MAC address and then sends data - but to the user the laptop looks like it's asleep. I don't know how to enable or disable this feature. I have been exploring the powercfg command in Windows and I would recommend you run the Sleep Study - it's amazingly details - an HTML output of what the PC has been up to. You must run this as Administrator user:

powercfg /sleepstudy

The laptop is most likely having to keep the USB-C interfaces active even while in sleep mode. You can check this with the command:

powercfg -devicequery wake_armed

If you want to see what "thing" last caused the PC to wake from its sleep, use this command:

powercfg -lastwake