cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2185
Views
0
Helpful
3
Replies

Error joining member in cluster ironport

SupportAC
Level 1
Level 1

Hi,

 

We have a cluster ironport. Last week we restarted one of this members. Once this member turned it on again there has been no way to reconnect the cluster. We see this error when we try to join in the cluster: 

 

Failed to join the cluster.

Error was: 'Unexpected EOF on connect'

 

Error connecting to cluster machine relay1.xx.xx.es (Serial #: xxxxxxxxxx-xxxxxxx) at IP xx.xx.xxx.xxx - Connection failure - ('_coro.pyx coro._coro.coro.__yield (coro/_coro.c:5321)|348', "<type 'exceptions.EOFError'>", '', '[cluster/connection_pool.py _create_cluster_connection|518] [cluster/cluster_command_client.py connect|123] [_coro.pyx coro._coro.sched.with_timeout (coro/_coro.c:11765)|1099] [cluster/cluster_command_client.py _ssh_connect|95] [transport/client.py connect|28] [transport/client.py _connect|101] [transport/transport.py receive_message|330] [_coro.pyx coro._coro._yield (coro/_coro.c:13527)|1277] [_coro.pyx coro._coro.coro.__yield (coro/_coro.c:5321)|348]')

 

Last message occurred 5 times between 

 

Version: 11.0.1-027

 

 

Why are we receiving this error joining the member into the cluster???? any way to solve it?

 

3 Replies 3

Libin Varghese
Cisco Employee
Cisco Employee

Could you run "clusterconfig" and "communication" to confirm which interface and port is being used for cluster communication.

 

You would then want to confirm using telnet that the first ESA can connect to the second ESA over that port and vice versa.

 

Also why was the appliance rebooted? Were there any configuration changes before the issue began?

 

Regards,

Libin Varghese

Hi Libin,

 

Thanks a lot for your reply.

 

The clusterconfig command can only be done in one of the two computers (the one that is still in the cluster), the first thing it does is ask if they have to communicate by IP or by name. Now it's over IP because I changed it when I was doing tests, it was originally by name. And then the command returns:

********************************************** *************
[]> communication

Should all machines in the cluster communicate with each other by hostname or
by IP address?
1. Communicate by IP address.
2. Communicate by hostname.
[1]>

All machines in the cluster will communicate with each other by IP address.

Group Main_Group:
  1. Machine relay1.xx.xx: using IP address 132.x.x.x port 22
********************************************** *************

I tried to run a telnet to port 22 from each of the two computers to the other both by IP and by name and always answered with the banner of "SSH-2.0-OpenSSH ..."

 

Last Friday we stopped the ironport2 because it had to do an electrical check in the datacenter in which it is and we stopped all the virtual machines of that datacenter.


I do not remember the last configuration change that we have made, but it was long time ago, it is probable that we have not changed the configuration since at the beginning of last year they were migrated from physical equipment to virtual machines.

Yes this Christmas take the opportunity to update the version, I do not remember what they had before but I would say it was a 10.0 and updated to 11.0.1-027.
Once the update was done, the devices were reconnected to the cluster, I remember we had to restart twice, but in the end the two were connected to the cluster.

After that we have not touched the devices until last week that we restarted one.

 

Any idea?

As the clusterconfig lists just one appliance currently it would appear the second appliance was removed from the cluster at some time.

 

Confirm all cluster requirements are met as per the below article and try the steps to join an existing cluster using SSH.

 

https://www.cisco.com/c/en/us/support/docs/security/email-security-appliance/200885-ESA-Cluster-Requirements-and-Setup.html

 

Regards,

Libin Varghese