New disk firmware and recovery messages

June 25, 2019, 12:57 am

≫ Next: Copy CIFS snapshots to a non-netapp system

≪ Previous: ontap upgrade two major version on the same time

Recently I installed new NA02 firmware to my X423_HCOBE900A10 disks to fix 888889 and 1239356 bugs. Now I see informational "Recovered error" messages from some disks according to KB articles.

Tue Jun 25 08:33:13 +04 [R53-na01-A:disk.ioRecoveredError.retry:info]: Recovered error on disk 3a.00.12: op 0x28:68cb85b0:0008 sector 255 SCSI:recovered error - Disk used internal retry algorithm to obtain data (1 b 95 95) (12) [NETAPP   X423_HCOBE900A10 NA02] S/N [KXH1SAUF]
Tue Jun 25 08:33:13 +04 [R53-na01-A:disk.ioFailed:error]: I/O operation failed despite several retries.

I know it's normal and disks heals itself with a new firmware but I have some questions.

1. Will these messages stop to appear after some time?

2. Is it possible that weekend aggr scrub task will broke my array if some error thresholds will be reached?

↧

Copy CIFS snapshots to a non-netapp system

June 25, 2019, 2:06 am

≫ Next: SLOW Cifs performance after snapmirror break and subsequent resync

≪ Previous: New disk firmware and recovery messages

Hi all,

We have an old FAS3140 that we need to decommission, it's out of support. It contains about 10TB Snapvault snapshots from a CIFS volume that will expire in a year. We want to move these snapshot to another storage, but do not have access to NetApp storage, only other NAS with SMB/NFS. Is that possible somehow?

Since it's possible to browse the snapshots via CIFS, I could just copy all snapshot folders to a NAS, but that would surely break the snapshots since it will not anylonger be stored on WAFL. Right?

Please advise.

Thank you!

Is it possible to copy

↧

SLOW Cifs performance after snapmirror break and subsequent resync

June 25, 2019, 2:15 pm

≫ Next: vol move problem

≪ Previous: Copy CIFS snapshots to a non-netapp system

We have been seeing an issue which is affecting our systems only when we break our snapmirrors for DR purposes and then fail back afterward. After the DR operation concludes and we are resynced to our original relationship, CIFS performance is much slower. We particularly see this when accessing Office files. We suspected this for a while and confirmed it by creating a "clean" test volume, then running the very same data analysis job against the test volume and an existing production volume. The test volume was just as fast as we expected. The production volume was about 6x slower to run the job. We can definitely track the beginning of the issue to the failover/failback. Does anyone have any ideas? FAS2552, OnTap 9.5.

↧

vol move problem

June 25, 2019, 11:43 pm

≫ Next: Changing from active/active to active/passive in 7-Mode

≪ Previous: SLOW Cifs performance after snapmirror break and subsequent resync

Hi,

I have a 6-node cluster consisting of FAS8020, FAS8040 and AFF300, running cDOT 9.5P4 and totally 3 SVMs,

the latest SVM3 with 2 aggegates from AFF300 disks.

Moving any volume from SVM2 to the aggregates of the new SVM3 is possible,

but when I try to move any volume from SVM1, the AFF300 aggregates are not shown as possible destinations.

Am I missing something?

Thanks in advance for any tips,

gavan

↧

Changing from active/active to active/passive in 7-Mode

June 26, 2019, 1:16 am

≫ Next: Fas 2440 cdot 9 shows not all Volumes in aggregate

≪ Previous: vol move problem

Hello,

I have a Netapp FAS2240-4 with Release 8.2.4P4 7-Mode. It has two controller boards, one disk shelf an a total of 48 disk (24x2TB in the FAS and 24x1TB in the shelf). It is out of support, so I think I can not upgrade Ontap. I used to have the disks configured like this:

Now I would like to use a configuration like this:

I bootet into maintenance mode and changed the owner of all disks to Node 1. But now Node 2 does not boot, because it does not have any disks. The status of the aggregates is like this:

aggr status -v
Aggr State Status Options
aggr0 online raid_dp, aggr root, diskroot, nosnap=off, raidtype=raid_dp,
64-bit raidsize=11, ignore_inconsistent=off,
snapmirrored=off, resyncsnaptime=60,
fs_size_fixed=off, lost_write_protect=on,
ha_policy=cfo, hybrid_enabled=off,
percent_snapshot_space=0%,
free_space_realloc=off

Volumes: vol0

Plex /aggr0/plex0: online, normal, active
RAID group /aggr0/plex0/rg0: normal, block checksums
RAID group /aggr0/plex0/rg1: normal, block checksums

aggr1 offline raid_dp, aggr diskroot, raidtype=raid_dp, raidsize=11,
foreign resyncsnaptime=60, lost_write_protect=off,
64-bit ha_policy=cfo, hybrid_enabled=off,
percent_snapshot_space=0%
Volumes: <none>

Plex /aggr1/plex0: online, normal, active
RAID group /aggr1/plex0/rg0: normal, block checksums
RAID group /aggr1/plex0/rg1: normal, block checksums

And the volume:

vol status -v
Volume State Status Options
vol0 online raid_dp, flex root, diskroot, nosnap=off, nosnapdir=off,
64-bit minra=off, no_atime_update=off, nvfail=off,
ignore_inconsistent=off, snapmirrored=off,
create_ucode=on, convert_ucode=on,
maxdirsize=45875, schedsnapname=ordinal,
fs_size_fixed=off, guarantee=volume,
svo_enable=off, svo_checksum=off,
svo_allow_rman=off, svo_reject_errors=off,
no_i2p=off, fractional_reserve=100, extent=off,
try_first=volume_grow, read_realloc=off,
snapshot_clone_dependency=off,
dlog_hole_reserve=off, nbu_archival_snap=off
Volume UUID: 370ddab1-6719-11e3-8d9a-123478563412
Containing aggregate: 'aggr0'

Plex /aggr0/plex0: online, normal, active
RAID group /aggr0/plex0/rg0: normal, block checksums
RAID group /aggr0/plex0/rg1: normal, block checksums

Snapshot autodelete settings for vol0:
state=off
commitment=try
trigger=volume
target_free_space=20%
delete_order=oldest_first
defer_delete=user_created
prefix=(not specified)
destroy_list=none
Volume autosize settings:
mode=off
Hybrid Cache:

What steps am I missing to setup the configuration in the picture above? If possible, I do not want to start from scratch.

Thank you,

Andreas
Eligibility=read-write

↧

Fas 2440 cdot 9 shows not all Volumes in aggregate

June 27, 2019, 5:15 am

≫ Next: widelink to namespace ?? !!

≪ Previous: Changing from active/active to active/passive in 7-Mode

Hello;

I see that i have in our 1 Node Cluster Fas 2440 with "storage aggregate show" that only 16.44TB as available, and there are 6 Volumes, but i only can see 3? Where are these Volumes and what they are containing?

What i need to do to free the aggregate?

Thanks+Greetings, Thomas

Aggregate Size Available Used% State #Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggrSATA_cluster3_a1
42.19TB 16.44TB 61% online 6 cluster3-01 mixed_raid_
type,
growing,
hybrid
aggrSSD 1005GB 991.3GB 1% online 1 cluster3-01 raid_dp,
normal
aggrSYS_cluster3_a1
757.9GB 36.72GB 95% online 1 cluster3-01 raid_dp,
normal
3 entries were displayed.

the command: storage aggregate show-space brings me:

Aggregate : aggrSATA_cluster3_a1

Feature Used Used%
-------------------------------- ---------- ------
Volume Footprints 25.76TB 61%
Aggregate Metadata 0B 0%
Snapshot Reserve 0B 0%

Total Used 25.76TB 61%

Total Physical Used 26.11TB 62%

↧

widelink to namespace ?? !!

June 27, 2019, 5:53 am

≫ Next: Removing hmac-ripe160 algoritham prior to 9.3P12 upgrade

≪ Previous: Fas 2440 cdot 9 shows not all Volumes in aggregate

hello folks how are you?

i´d like to migrate a very big infrastructure of widelinks to namespace in Cdot, does anyone know if has a tool to simplify this?

basically the customer did the transition of 7-mode to Cdot but he kept the widelinks in Cdot, but, now he asked if possible migrate every widelinks to namespace???

↧

Removing hmac-ripe160 algoritham prior to 9.3P12 upgrade

June 27, 2019, 11:13 am

≫ Next: Netapp API

≪ Previous: widelink to namespace ?? !!

Prior to upgrading OnTAP from 9.1P8 to 9.3P12 we were told to remove the hmac-ripemd160 algorithms from each vserver using the security ssh remove command. I was able to do that on each vserver last weekend in preparation for this weekends code upgrade. Today I noticed they are back??? does anyone have any idea what they would show up after being removed?

↧

Netapp API

June 28, 2019, 8:33 am

≫ Next: Ontap 9.5 and snapdrive

≪ Previous: Removing hmac-ripe160 algoritham prior to 9.3P12 upgrade

Hi ,

I am beginner for Netapp API . I have it install on linux box but ni idea how to start using API to connect to my storage and get information that i need .

Is there any step by step guide for using API for a beginner ?

Thanks

↧

Ontap 9.5 and snapdrive

June 28, 2019, 10:35 am

≫ Next: How to create Root aggregate across external shelf during initialization

≪ Previous: Netapp API

We are currently running clusters on 9.1 and 9.3 and I have been asked to look at upgrading to 9.5 in the near future. Looking on the imt there is no supported version of snapdrive for 9.5. This is going to put a stopper on doing any upgrade as we have many servers running snapdrive. I did see that SD 7.1.5P2 has a bug fix for resizing disks on 9.5 so is this an oversight in the imt or is there an alternative that we are supposed to be moving to ?

↧

How to create Root aggregate across external shelf during initialization

July 1, 2019, 1:52 am

≫ Next: DATA ONTAP 9.3P2 to 9.6RC2

≪ Previous: Ontap 9.5 and snapdrive

Hi everyone,
First time posting here.

On a new FAS2720(9.5P5) and DS212C, I would like to configure ADP using a total of 24 NL-SAS disks.
So, I actually executed the steps 9a and 9b from the boot menu, but a Kernel Panic occurred after executing step 9b.
When a Kernel Panic occured, the following message was displayed.

------------------------------------------------------------------------------------------------------------------------------------
Jun 27 17:48:57 [localhost: raid.autoPart.start: notice]: System has started auto-partitioning 6 disks.
Jun 27 17:48:58 [localhost: raid.autoPart.done: notice]: Successfully auto-partitioned 6 of 6 disks.
Unable to create root aggregate: 5 disks specified, but at least 7 disks are required for raid_tec
------------------------------------------------------------------------------------------------------------------------------------

According to HWU, FAS2720 can configure ADP with 24 disks.
Is this due to a bug in the product? Or is there any problem with the procedure?

As a side note, I think that it is a problem that occurs only when an external shelf is connected,
because there is no problem in performing the same procedure without connecting DS212C.

Could you please help?
Many thanks

↧

DATA ONTAP 9.3P2 to 9.6RC2

July 2, 2019, 12:45 am

≫ Next: Performance impact with setting option auth-sys-extended-groups to "enabled" for NFS 16 group limit

≪ Previous: How to create Root aggregate across external shelf during initialization

I have netapp FAS2650 with Data ONTAP 9.3P2, and I want to upgrade the Data ONTAP to 9.6RC2?
is there a compatibility issue or something else?

Thank in advance

↧

Performance impact with setting option auth-sys-extended-groups to "enabled" for NFS 16 group limit

July 2, 2019, 1:20 pm

≫ Next: Is it possible to replace user lookup in /etc/passwd by ldap (7-Mode)?

≪ Previous: DATA ONTAP 9.3P2 to 9.6RC2

ONTAP 9.3P2, RHEL 7 clients accessing research data. We're facing the NFS 16 group limit and about to enable the auth-sys-extended-groups option for the serving SVM.

We've been asked investigate any performance impact, which seems fairly difficult to quantify with so many variables involved. If anyone has some experience with the workaround, I'd like to hear their thoughts.

Questions:

1) With bypassing the RPC group mask and doing nameservice lookups, is the Vserver NFS Credential Cache still involved?

Will there be any caching involved with the increased lookups?

2) I know "it depends" is involved here, but in general is there any performance of workload A was this prior to enabling, and now performance of workload A appears greatly impacted, minor impact, or same? Not looking for absolute numbers, ever environment is different, but if anyone has some anecdotal information that would be helpful.

3) Has anyone monitored their LDAP/AD servers while making this change to get a sense of impact?

Any information would be appreciated.

↧

Is it possible to replace user lookup in /etc/passwd by ldap (7-Mode)?

July 4, 2019, 1:13 am

≫ Next: ClusterDeployFailed ONTAP Select

≪ Previous: Performance impact with setting option auth-sys-extended-groups to "enabled" for NFS 16 group limit

We urgently needed some storage, so I decided to reinstall an old Netapp FAS2240-4 with Ontap 8.2.5 in 7-Mode. I do not have any newer licenses and there is no support for this FAS anymore.

I have configured the FAS for access with CIFS and NFS in an Active Directory Domain. I created a volume with NTFS Security Style and I can mount and access it from Windows and NFS, as long as the Unix user is created in the /etc/passwd file on the filer. Since we have Unix information added in our Active Directory, I thought I could use LDAP to retrieve the information that I now have to manuallay add to /etc/passwd, but I can not get it to work. When I set

options ldap.enable on

and remove the user information from /etc/passwd, I get an error like this:

auth.trace.authenticateUser.loginTraceMsg:info]: AUTH: Error in passwd look up of uid 75080 during login from 10.1.3.34

We have attributes uid and uidnumber in AD, which hold the username (which is identical to the samAccountName) and the integer id number of the user. Is it possible to replace lookup of user ids in /etc/passwd by using LDAP?

Here is my configuration:

ldap.ADdomain office.example.com
ldap.base dc=example,dc=com
ldap.enable on
ldap.port 3268
All other values are the default values.

rdfile /etc/nsswitch.conf

hosts: files       nis     dns
passwd: files    nis    ldap
netgroup: files    nis  ldap
group: files    nis     ldap
shadow: files      nis

/etc/usermap.cfg is empty. All Unix usernames are identical to the Windows/AD user names.

Thank you for your help,

Andreas

↧

ClusterDeployFailed ONTAP Select

July 4, 2019, 6:09 am

≫ Next: SnapMirror Resync Procedure

≪ Previous: Is it possible to replace user lookup in /etc/passwd by ldap (7-Mode)?

I want to setup a single node cluster to test ONTAP Select.

I've created a standalone ESXI 6.7 U2 and registered the VM via the OVA file I downloaded from the Netapp site.

Fot the ESXI server I created a new port group and connected the NIC from the VM to it.

So I now have a Management network (VMkernel) and a port group. Both are connected to the same VSwitch.

When the Deploy VM is up and running, I can connect via a browser to it. The Ontap Select Deploy will guide me through the steps to get a single node running.

Everytime when the creation of this new node is almost finished it give me an error at Post Deploy Setup.

It starts with:

IPPingStatus: Ping succeeded on IPs. Waiting for the following IPs [192.168.139.61]

ClusterDeployFailed: NodeNotPingable: Not all nodes in the cluster "CLUSTERNAME" responding to pings on node management IP. Check the networking configuration of the host, including VLAN IDs, gateway, etc.

↧

SnapMirror Resync Procedure

July 5, 2019, 7:14 am

≫ Next: Disk partitioning on AFF-A220 with additional shelf

≪ Previous: ClusterDeployFailed ONTAP Select

I have initialized a SnapMirror XDP relationship (Ontap 9.3P11 at DEST, 9.1P16 at SOURCE) and broken the relationship in order to copy (robocopy) ALL data from the LUNs within the volume to a new NAS volume for CIFS - so I'm migrating from LUNs to CIFS here.

The copy process is about to complete so I intend to resync the mirror, update, break and run robocopy to sync to copy.

Is this is valid procedure or am I missing somehting critical here?

Additional info:

There are 7 LUNs in the volume which provide 11 VMDKs to a single Windows Server that are spanned as a single 100TB disk. So I am emptying that bad design into a NAS CIFS share via a FlexGroup volume for future growth.

↧

Disk partitioning on AFF-A220 with additional shelf

July 8, 2019, 12:17 am

≫ Next: OntapSelect9.6 - Cluster Deploy Failed

≪ Previous: SnapMirror Resync Procedure

Hello,

We bought an AFF-A220 with 24 x 960 Gb disks (X371_S1643960ATE), connected to a shelf with 24 disks of the same model.

This AFF has been added to an existing 6-node cluster (AFF-A200, FAS2554), in 9.5P5.

On the controller shelf, the disks are partitioned with root-data1-data2 scheme, which is expected.

We created 2 aggregates, one that spans on P1 partitions, the other one on P2 partitions:

storage1::*> disk partition show 5.0.0.*
                          Usable  Container     Container
Partition                 Size    Type          Name              Owner
------------------------- ------- ------------- ----------------- -----------------
5.0.0.P1                  435.3GB aggregate     /ssd4/plex0/rg0   storage1-02
5.0.0.P2                  435.3GB aggregate     /ssd3/plex0/rg0   storage1-01
5.0.0.P3                  23.39GB aggregate     /aggr0_storage1_01_0/plex0/rg0
                                                                  storage1-01

(similar output for 22 other disks, the last one is a spare disk)

The problem is with the extension shelf, the disks were not partitioned, and it seems the only way to have them partitioned is to add the disks to the root aggregate (seen with output of "storage aggregate add-disks" and "-simulate" flag).
We would like to avoid having the root aggregates using the disks of the extension shelf.

If we try to add the disks to one of the data aggregates, the full disks would be used, without partitioning.

As a workaround, we were able to manually partition the disks by using "disk partition -n 2 <disk_ref>" in node shell, which gives us this configuration:

storage1::*> disk partition show 5.1.0.*
                          Usable  Container     Container
Partition                 Size    Type          Name              Owner
------------------------- ------- ------------- ----------------- -----------------
5.1.0.P1                  447.0GB aggregate     /ssd6/plex0/rg0   storage1-02
5.1.0.P2                  447.0GB aggregate     /ssd5/plex0/rg0   storage1-01
2 entries were displayed.

Then we are able to create two additional data aggregates that span on P1 and P2 partitions.

For us, this is an ideal setup, since we don't have a P3 partition here for the root aggregate (and we don't loose about 24 Gb per disk), and the root aggregates remain of the controller shelf.

The question is: is this a supported setup ? If not, what would be a clean way to have data partitioning on the extension shelf ?

Thanks in advance!

↧

OntapSelect9.6 - Cluster Deploy Failed

July 8, 2019, 12:31 am

≫ Next: Recover Netapp base config (aggrs, vols, SVMs, igroups, name spaces etc)

≪ Previous: Disk partitioning on AFF-A220 with additional shelf

Hi, I am trying to deploy Ontap Select 9.6 cluster via Deploy Utility 2.12 on a VMware 6.7 update2 cluster using Distributed Switch on a pair of uplink with trunking enabled. I believe network communication is good and the reason that failed is "Failed to add the Cserver record in RDB . The certificate has expired."

Anyone has any idea what is the possible reason?

thanks

regards

↧

Recover Netapp base config (aggrs, vols, SVMs, igroups, name spaces etc)

July 8, 2019, 4:08 am

≫ Next: XCP Stats Output

≪ Previous: OntapSelect9.6 - Cluster Deploy Failed

Hello All - We are looking to recover NetApp base config (aggrs, vols, SVMs, igroups, name spaces etc) without any user data.

Idea is that in cyber attack scenario where no disk data can be trusted (including any snap locked data), NetApp disks are wiped off, NetApp base config (without user data) is restored and then user data is restored from tapes using NDMP.

Reason for tape restore and not going for typical DR solution where whole config\data is replicated offsite is to have air gap between backup config\user data to ensure no penetration of malicious code in cyber attack scenario.

We have tried to restore NetApp cluster base config using config backup restore but it didn't yield the required results. Config restore didn't restore aggr\vol\smvs\igroups\name space etc.

Has anyone done this before or got any idea about this?

↧

XCP Stats Output

July 8, 2019, 9:03 am

≫ Next: OnCommand System Manager Applitacion

≪ Previous: Recover Netapp base config (aggrs, vols, SVMs, igroups, name spaces etc)

Is there a way/syntax for XCP SCAN to provide a more detailed output of which files/folders have not been modified >1 year?

When I run xcp scan -stats \\SMBPath it returns a very high level report which includes:

== Modified ==
>1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future invalid
1049 383 3

Is there a way to determine WHICH files fall into the 1049 listed above? So if I want to archive those files, how would I know where they were in order to do so?

↧