Quantcast
Channel: ONTAP Discussions topics
Viewing all articles
Browse latest Browse all 4910

Disk copy temporarily suspended and will resume automatically...

$
0
0

I'm in the process of emptying a disk shelf on an AFF8080 in order to move to to a newer A700 system...

The AFF8080 is a two node system with disk partitioning 3.8T SSD disks...  pretty standard Root-Data-Data partitioning.

One RG on each node sharing three DS224-12 shelfs...

We have emptied one of the two aggregates and we are now in the process of copying around the partitions in order to empty one of the three shelfs...

it all worked fine for two of the three RAID groups, but the last RG seems to stall on us...

We basically run a command like:

disk partition replace -action start -partition 4.1.10.P2 -replacement 1.10.4.P2

And the copy starts which we can see with the "storage aggregate show-status -aggregate DATA02"

And it does indeed show us:

shared 4.1.2 0 SSD - 1.74TB 3.49TB (replacing, copy in progress)
shared 4.0.0 0 SSD - 1.74TB 3.49TB (copy 0% completed)

So far so good...

But... it never get last the 0%... in fact in the event log we can see the following:

event log show

5/27/2020 17:22:31 NETAPP01-02 NOTICE raid.rg.diskcopy.aborted: /DATA02/plex0/rg2: disk copy from 0d.01.2P2 to 4a.00.0P2 aborted at disk block 5248 after 53:38.94. Reason: Disk copy temporarily suspended and will resume automatically..

And we have a lot of these notes and none of them gets bast block 5248... it's been like this for an hour now... (53 mins.)

There is a bit of load on the aggregate...

NETAPP01::*> statistics aggregate show NETAPP01 : 5/27/2020 17:24:15 *Total Read Write Read Write Latency Aggregate Node Ops Ops Ops (Bps) (Bps) (us) --------- --------------- ------ ---- ----- --------- --------- ------- DATA02 NETAPP01-02 23307 9001 8351 190295040 178327552 284NETAPP01::*> node run -node NETAPP01-02 -command sysstat -u 1 CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP_Ty Disk ops/s in out read write read write age hit time [T--H--F--N--B--O--#--:] util 79% 16839 186 397 81384 41384 0 0 9s 99% 0% 0--0--0--0--0--0--0--0 3% 77% 16378 237 475 92520 40840 0 0 9s 98% 0% 0--0--0--0--0--0--0--0 3% 76% 17495 593 813 80588 39768 0 0 9s 99% 0% 0--0--0--0--0--0--0--0 3% 79% 16614 383 1152 94324 43460 0 0 9s 98% 11% 0--0--0--0--0--0--0--0 3%

As you can see there is quite some CPU load on the system... but that's all because of this copy... even though it does not seem to do anything...

In this sysstat check I have stopped the disk replace...

NETAPP01::*> node run -node NETAPP01-02 -command sysstat -u 1 CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP_Ty Disk ops/s in out read write read write age hit time [T--H--F--N--B--O--#--:] util 47% 18766 99 44 185051 54054 0 0 44s 95% 100% 0--0--0--0--0--0--0--1 4% 30% 17002 41 4040 155132 114824 0 0 44s 99% 100% 0--0--0--0--0--0--0--1 2% 38% 17862 8 11 135448 69704 0 0 24s 99% 100% 0--0--0--0--0--0--0--1 1% 32% 14811 11 17 130132 61452 0 0 28s 98% 100% 0--0--0--0--0--0--0--1 1%

I have managed to replace 15 out of 24... but he last 9 just won't start and hangs as described above...

 

The raid.resync.perf_impact is set to medium. and I'm not too keen on raising it to high...

 

There does not seem to be any other errors on the system...

I'm just trying the community before opening a case, maybe someone have the golden key to this? 😉

 

/Heino

 


Viewing all articles
Browse latest Browse all 4910

Trending Articles