Background: We recently deployed two Dell servers to serve as ESX hosts for a small office in our company. Each host has 96GB of memory, 16GB of which was reserved for an OnTAP Select node (it's a 2 node HA Pair, small size).
Despite the fact that they have 28 VMs, they use a very minimal amount of memory collectively. The Dell IDRAC memory stats show roughly 10 - 20% of memory utilization consistently. Having said that, the Host summary tab in vCenter constantly throws up "Host Memory Usage" alerts and shows that the hosts are nearly out of memory, even though iDRAC shows very little memory in use. This office wants to add a few more VMs but I was concerned about the memory allocation. After consulting with a VMware expert, we were told my analysis was correct and that since Dell showed little memory actually in use, we should be able to overcommit and add the additional VMs.
Four VMs were added and powered on. We experienced an outage in which the hosts froze up and VMs were inaccessible until we powered the new VMs off. During a second attempt in which they were powered on one at a time, a less drastic but similar problem occurred after powering on the second VM.
We have cases open with NetApp and VMware but are having trouble making headway so far. Does anyone have insight into this issue? I can't understand why we would need more physical memory, yet the symptoms seem to indicate we do. The only VMs with reservations are the OnTAP Select nodes.