Hi
I have an issue with an ifgrp configured as LACP on a FAS8020 controller in cluster mode. Each controller has e0c and e0d configured as LACP, however, one of my ifgrp's which is on node1 is showing as disabled, although when i check both e0c and e0d ports individually, they both show status as UP. Both connections link back to separate Juniper switches of which both ports show in the UP state but status as ATTACHED. The ifgrp that is working on controller node 2 is showing port and ifgrp status UP and on the juniper switches as status COLLECTING DISTRIBUTING.
Everything was working fine up until a few days ago when the all my lifs that where on the ifgrp on controller node 1 that is no longer working failed over one night.
We've disabled and enabled the ports from the switches and also from the netapp controller for e0c, e0d and the ifgrp. Our networks guys have checked their config and logs and that all seems to be fine. I've pulled down the LACP log from controller node 1, part extract is below.
So let me break this down, both configurations are identical and have been working fine for a while until a few days ago.
Controller node 1 -
ifgrp a0a : comprises of e0c and e0d in LACP mode. Was working perfectly, but is now in disabled state. Both e0c and e0d are in up state
Controller node 2 -
ifgrp a0b : comprises of e0c and e0d in LACP mode. a0b status is enabled. Both e0c and e0d are in up state
Extract from LACP log controller node 1
---
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0c)
2016-07-29 03:23:42: ERROR: lag_link_up_to_down (e0d)
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: View of partner incorrect:e0c moving select to unselected
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
---
2016-07-29 02:32:22: ERROR: Prev_stored (Actor- Local, Partner- Remote device):
LACP state information: e0c
Actor - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
Partner - SysPri:0 , SysID: 0:0:0:0:0:0, key: 0, PortPri: 0, PortID:0
Actv:0 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
---
2016-07-29 02:32:22: ERROR: Rx_machine (Actor- Remote device, Partner- Local):
LAG_PKT: e0c setting partner port sync to FALSE
Actor - SysPri:127 , SysID: , key: 247, PortPri: 127, PortID:52
Actv:1 ,Timeout:1, Agg:1, Sync:0, Coll:0, Dist:0, Default:0, Expired:0
Partner - SysPri:1 , SysID: , key: 1, PortPri: 0, PortID:2
Actv:1 ,Timeout:0, Agg:1, Sync:0, Coll:0, Dist:0, Default:1, Expired:1
Please can someone point me in the right direction as to where the problem may be and a possible fix. Your input is greatly appreciated.
Cheers