cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
47958
Views
40
Helpful
20
Comments

 

MarceloMorais_0-1654436644727.png For an offline or printed copy of this document, simply choose ⋮ Options > Printer Friendly Page. You may then Print > Print to PDF or Copy & Paste to any other document format you like.

 

Introduction

This document briefly introduces the meaning of Queue Link Error, how to deal with it, and its impact on ISE Deployment.

 

What is the Queue Link / Queue Link Error ?

Since ISE 2.6 the ISE RabbitMQ Container was renamed to ISE Messaging Service (a Message Broker Container that runs on a Docker). ISE Messaging Service is started on each ISE Node and used for exchanging information between Nodes (via TLS using a Certificate issued by ISE's Internal CA). Queue Link is the connection between these Nodes, and Queue Link Error means that something went wrong !!!

This Alarm is expected in case you are performing any Deployment operations such as: registering a Node to Deployment, manually syncing a Node from PPAN, a Node being in out-of-sync state, a Node Application Service is getting restarted, changing the Domain Name or Hostname of your PAN/PSN, restoring a Backup on a New Deployment or Promoting the Old PPAN to New PPAN post upgrade.

You are able to check the ISE Messaging Service via ISE CLI using the following command:

ise/admin# show application status ise

ISE PROCESS NAME STATE PROCESS ID
-----------------------------------------------------
Database Listener running 15215
Database Server running 131 PROCESSES
Application Server running 27711
...
Docker Daemon running 16843
TC-NAC Service disabled
...
ISE Messaging Service running 43944
Segmentation Policy Service disabled
SSE Connector disabled

IMPORTANT: " ... the Process Down alarm is no longer triggered when ISE Messaging Service fails on a Node. When ISE Messaging Service fails on a Node, ALL the Syslogs and the Process Down alarm will be lost until the Messaging Service is brought back up on that Node... " (at Cisco ISE 3.1 Maintain and Monitor) !!!

 

Troubleshooting - Queue Link Error

At ISE > Home you may see the Queue Link Error record in the Alarms dashboard:

Alarms Dashboard.png

 

Click the Queue Link Error record to open a detail description of it:

 

Alarms Queue Link Error.png

 

The description of the Suggested Actions is:

"Please check and restore connectivity between the Nodes.

Ensure that the Nodes and the ISE Messaging Service are up and running.

Ensure that ISE Messaging Service ports are not blocked by Firewall.

Please note that these Alarms could occur between Nodes, when the Nodes are being registered to Deployment or manually-synced from PPAN or when the Nodes are in out-of-sync state or when the Nodes are getting restarted."

 

You can also check this info at Operations > Reports > Reports > Audit > Operations Audit > filtering by:

  • Object  Type = System-Management
  • Requested = The federation link was down or Event Unknown CA

Operations Audit.png

 

Note: ISE Messaging Services uses port TCP/8671 !!! Please take a look at the following ISE CLI command:

ise/admin# show ports
...
Process : docker-proxy (43916)
tcp: :::8671

 

Note: TCP/8671 is used by ALL Nodes (PAN, MnT and PSN) for Inter-Node Communication. Please take a look at the following ISE CLI commands:

isePAN/admin# show logging application ise-messaging/rabbit-ise-connection.log
...
2022-11-22 18:47:39.224 [error] <0.7017.0>@rabbit_reader:log_hard_error:785 Error on AMQP connection <0.7017.0> (<PSN IP Addr>:40486 -> 169.254.x.y:5671 - Federation link (upstream: E-Mesh-FOR-<PAN Hostname>:8671-TO-<PSN Hostname>, policy: Policy-FullMesh), vhost: '/', user: 'rabbitmq', state: running), channel 0:
...
isePSN/admin# show logging application ise-messaging/rabbit-ise-connection.log
2022-11-22 18:47:39.201 [warning] <0.3335.1078>@rabbit_reader:log_connection_exception_with_severity:447 closing AMQP connection <0.3335.1078> (<PAN IP Addr>:50533 -> 169.254.x.y:5671 - Federation link (upstream: E-Endpoints-FOR-<PSN Hostname>:8671-TO-<PAN Hostname>, policy: Policy-Endpoints), vhost: '/', user: 'rabbitmq'):
...

 

Examples of Causes that you may see on the Queue Link Error message:

  • Cause=basic_cancel
  • Cause=Timeout
  • Cause=Econnrefused
  • Cause={tls_alert;"handshake failure"}
  • Cause={tls_alert;"unknown Ca"}

Note: it's also possible to see the Queue Link Error message via ISE CLI with the following command:

ise/admin# show logging application ise-messaging/rabbit-ise-federation.log
...
2022-06-05 02:55:04.776 [warning] <0.446.0>@rabbit_federation_link_util:log:283 Federation exchange 'E-Mesh' in vhost '/' did not connect to exchange 'E-Mesh' in vhost '/' on amqps://<Node IP Addr>:8671
{error,{tls_alert,"unknown ca"}}

 

Cause=basic_cancel

The reason for this error showing up is usually when: changing hostname of ISE Nodes or there has been a Node Promotion (there are residual links from Old PAN to the rest of the Deployment which are sometimes not cleaned up on promotion)!!!

Please take a look at:

 

Cause=Timeout

This Alarm cause no functional impact, it is potentially caused by Network Congestion during the time of alarms between the Nodes Alarm is triggered for ... these can be ignored !!!

Please take a look at:

Also double check the TCP 8671 flow between the Nodes, via ISE CLI:

ise/admin# tech dumptcp 0 | inc 8671
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
10.10.10.1.34190 > 10.10.10.2.8671: Flags [S], cksum 0x9544 (correct), seq 113726506, win 29200, options [mss 1460,sackOK,TS val 2259048610 ecr 0,nop,wscale 7], length 0
--
10.10.10.2.8671 > 110.10.10.1.34190: Flags [S.], cksum 0x5f32 (incorrect -> 0xf139), seq 1497652336, ack 113726507, win 28960, options [mss 1460,sackOK,TS val 2258331801 ecr 2259048610,nop,wscale 7], length 0
--

 

Cause=Econnrefused

This Alarm is considered as cosmetic. It is potentially caused by a period of time when it is unable to connect to your Node and the connection is refused !!!

Please take a look at:

 

Cause={tls_alert;"handshake failure"}

There are several possible reasons for the error, but most often it is due to a problem with the Certificate Chain used by the ISE Messaging Service. Please take a look at Generate Signing Requests (CSR) bellow.

 

Cause={tls_alert;"unknown Ca"}

There are several possible reasons for the error:

1. when Dedicated MnT option is selected (at Administration > System > Deployment > select the MnT Node and check Dedicated MnT) ... take a look at:

2. when utilizing Third-Party signed certificate ... take a look at:

3. but most often it is due to a problem with the Certificate Chain used by the ISE Messaging Service. Please take a look at Generate Signing Requests (CSR) bellow.

 

Generate Signing Requests (CSR)

1st double check if the Certificate Authority is enabled, at Administration > System > Certificates > Certificate Authority > Internal CA Settings > if you see Disable Certificate Authority, then it's enabled !!!

Internal CA Settings.png

 

2nd if it's just a problem with the ISE Messaging Service of a particular Node(s), at Administration> System> Certificates> Certificate Management> Certificate Signing Requests > click Generate Certificate Signing Requests (CSR):
CSR.png

 

Select ISE Messaging Service in Usage, select the Node(s) you want to reissue and generate by clicking Generate ISE Messaging Service Certificate:
ISE Messaging Service.png

 

The following message appear:

ISE Messaging Service Messaging.png

 

IMPORTANT: during the generation of the ISE Messaging Service there is NO reboot or Deployment break, only the initialization of the ISE Messaging Service !!!

 

3rd if it's not just a problem with the ISE Messaging Service of a particular Node(s), you may need to replace the entire Chain of Internal CAs, at Administration> System> Certificates> Certificate Management> Certificate Signing Requests > click Generate Certificate Signing Requests (CSR) > select ISE Root CA in Usage and click Replace ISE Root CA Certification Chain:

ISE Root CA.png

 

The following message appear:
ISE Roota CA Messaging.png

 

IMPORTANT 1: when you replace the Cisco ISE Root CA chain, the Cisco ISE Messaging Service Certificate is also replaced. This is followed by the restart of the Cisco ISE Messaging Service with a downtime of about 2 minutes. During the replacement of the ISE Root CA there is NO reboot or Deployment break !!!

IMPORTANT 2: to avoid losing the Syslogs during the downtime, disable for a short period of time the Cisco ISE Messaging Services (at Administration > System > Logging > Log Settings > uncheck the Use "ISE Messaging Service" for UDP Syslogs delivery to MnT

ISE Messaging Settings.png

 

IMPORTANT 3: if you notice a Slow Replication on Secondary Nodes post re-generating the Root CA, then re-registering the Secondary Node will fix the issue, please take a look at:

 

4th at Administration > System > Certificates > Certificate Authority > Certificate Authority Certificates, Delete & Revoke old Certificates.

IMPORTANT 1: when you re-generate the Internal CA Root Chain, ISE does not delete the Old One automatically. As long as ISE retains the Old Root Chain, it will Trust Certificates presented by the Endpoints with Identity Certificates signed by that Chain (if that is the case) !!!

IMPORTANT 2: delete Old Internal Certificates is an important step to prevent some bugs with 200+ Internal Certificates on PPAN that causes Slow UI, Slow Replication and High CPU/Load, please take a look at:

CA Certificates.png

 

Other Causes

Other causes of Queue Link Error:

Please take a look at:

 

Effect of Queue Link Error !!!

The Queue Link Error may not be harmful depending on the usage situation. Let's take a look of some examples the ISE Messaging Service is used.

 

ISE Messaging Settings

If there is a problem with Queue Link and there is a problem with log transfer to MnT between PSN and MnT:

  • Live Logs are not displayed
  • Report is not displayed
  • Dashboard System Summary is not displayed

In addition to the above measures, this event may be recovered by unchecking the Use "ISE Messaging Service" for UDP Syslogs delivery to MnT (enabled by default  since ISE 2.6 P2and switching to log transfer that does not go through the Messaging Service at Administration > System > Logging > Log Settings > ISE Messaging Settings:

ISE Messaging Settings.png

 

Please take a look at:

 

Light Data Distribution (LDD)

At Administration > System > Settings > Light Data Distribution. Initially it was called Light Session Directory (LSD), but it has changed to this name due to the addition of functions and the abbreviation.
LDD is used to store User Session Information and replicate it across the PSNs in a Deployment, thereby eliminating the need to be dependent on the PAN or MnT Nodes for User Session details. In case of connectivity issues between the PSNs, for example, when a PSN is down, the Session Details are retrieved from the MnT Session Directory and stored for future use.
LDD uses Cisco ISE Messaging Services (that uses a Certificate signed by the Internal-CA Chain) for Inter-Node Communication.
If there is a problem with the function that uses the exchange of information between Nodes, it may be one way to check once if there is an Alarm of Queue Link Error.
 
 
Hope this helps !!!
Comments
ggomezga
Cisco Employee
Cisco Employee

Nice my friend, always very well documented. 

 

Best regards,

Gustavo

Thanks @ggomezga !!!

 

Best regards.

Peter Koltl
Level 7
Level 7

Generating a new Messaging certificate is necessary if the existing Messaging certificate's chain is not complete up to the internal root CA certificate.

Regenerating the whole internal root CA chain is necessary if the chain is not complete from each node's Endpoint SubCA up to the internal root CA certificate. (I. e. the 3-level tree is not complete.)

Hi @Peter Koltl ,

 thanks for adding !!!

Carl King
Level 1
Level 1

I'd like to know what the over all impact at the time the root CA cert is replaced.

Do all nodes need to be rebooted, if so what order?

How does this affect the deployment. Does it break?

Hi @Carl King ,

 thanks for adding ... I add the following:

"IMPORTANT: during the replacement of the ISE Root CA there is NO reboot or Deployment break !!!"

Best regards

Arne Bier
VIP
VIP

@Marcelo Morais  - mate!  This document is just perfectly written for operations engineers!   Nice work!   We need more docs like this

nmitev
Cisco Employee
Cisco Employee

This article not only is extremely helpful but beautifully formatted. Couldn't agree more with Arne's statement above.

Hi @Arne Bier and @nmitev thanks for the kind words !!! 

For the last few months I've been working on a case of Slow Replication, it will probably be my next document, I hope it helps other people not to go through the "suffering" I have been going through in this case  : )

Thomas Schmitt
Level 1
Level 1

Many Thanks for this overview, but I miss one thing - in a distributed deployment  with PPAN, PAN, PMNT, MNT and multiple PSNs, which nodes communicate with each other via tcp port 8671 and who initiates the respective connection (for firewall policies)?

@Marcelo Moraiscan you please add this information to your overview?

Hi @Thomas Schmitt ,

 thanks for the suggestion. I have included a Note about "Inter-Node Communication of ALL Nodes using TCP/8671"

Best regards

jsteffensen
Level 1
Level 1

Hi Marcelo
Thank you for this helpful aricle. ISE Messaging service is now a little more understandable.
Do you by any chance also have any information about the error "broker forced connection closure with reason 'shutdown'"?

[error] <0.2109.0>@rabbit_reader:log_hard_error:798 Error on AMQP connection <0.2109.0> (10.1.1.1:53559 -> 169.254.2.10:5671 - Federation link (upstream: E-Mesh-FOR-ise03.domain.local:8671-TO-ise02.domain.local, policy: Policy-E-Mesh), vhost: '/', user: 'rabbitmq', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

Best Regards

Hi @jsteffensen ,

 thanks !!!

 About  AMQP, port 5671 and broker forced connection closure with reason 'shutdown' ...

1st. the show tech-support command has a "Displaying RABBITMQ status output..." with the following:

{listeners,
...
{'amqp/ssl',5671,"::"},

Note: the show ports | inc 5671 command has an empty result !!!

2nd also double check any errors at "Displaying RABBITMQ status output..."

3rd there is a better explanation of "RabbitMQ Broker Fails to Start" at the following link: Message System Troubleshooting. (search for Message System Troubleshooting and check the RabbitMQ Broker Fails to Start).

About the error on ISE ...

1st what is your ISE version/patch ?

2nd are you able to check this error at ise-messaging/rabbit-ise-federation.log and also at ISE GUI > Home > Alarms ?

Regards

yastop
Level 1
Level 1

Many Thanks for this overview

More than welcome @yastop !!!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: