Hi NetApp,
We received advisory email from NetApp to upgrade the ONTAP version before this issue impacts our customers. We are told to upgrade asap.
Looking at the BUG and KB, it appears there is a NMI PCI errors on the CNA [UTA2] card due to non-correctable ECC erros resulting reboot, basically the Node will be failed-over to prevent loss of data and to maintain data integrity and will be failed back.
BUGID: https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1026931
For a dedicated NetApp clusters in a small environment, this is not an issue, but for managed services company with more than 50 clusters its easier said then done. We need to first make sure everything connected to NetApp is compatible using Matrix site and only then proceed towards upgrading DR first and PROD next. With so many Clusters it may well take some.
Concern: My concern is about 'insufficient explanation' around this BUG in the KB or BUG itself?
CNA [UTA2] - Can be used in two personality mode:
1. FC only
2. CNA (FCoE) - Protocols allowed : FC, ISCSI, CIFS & NFS
CNA [UTA2] - Provides - hardware offload support for iSCSI and FCoE , and I believe for CIFS/NFS there is no offloading stuff, DATA is just passed on like any other Ethernet NIC.
My question is:
1. Does this BUG effect customers using CNA personality mode for - CIFS/NFS only ? and if yes how does it impacts ?
2. Looking at the advisory it appears the solution is to upgrade the ONTAP, which means there is nothing wrong withe the Hardware or Firmware of the Device CNA ? ONTAP will probably do some early detection and reset the non-correctable ECC errors before it panics ?
3. Workaround says - I must say very confusing to read - It reads- Change any un-used CNA mode to FC mode ? What do you mean by that - If the Ports are CNA mode and offline, they will still be impacted. How about the Ports that are in CNA mode at the moment and serving data to customers. I thought workaround is always for the current situation and not for something that is un-used.
Those are the 3 key questions for now. But, I would really appreciate if you could also let us know - Any particular logs in the NetApp logs directory that might spit up some errors which would indicate that we are closing in on the BUG mentioned?
We have a large NetApp Customer base, so would really appreciate if someone from NetApp could help us answer this queries ?
Many thanks,
-Ashwin