spbm2bThe VMware Storage and Availability team is looking for customer and community feedback with regarding some storage technologies and use cases. Please take a few minutes to fill out the brief survey listed in the link below.

Storage Technology & Use case

Thank you for your help and support.

For future updates on Virtual SAN (VSAN), Virtual Volumes (VVols), and other Software-defined Storage technologies as well as vSphere + OpenStack be sure to follow me on Twitter: @PunchingClouds

VSAN-Ops-LogoIn my previous Virtual SAN operations article, “VMware Virtual SAN Operations: Disk Group Management” I covered the configuration and management of the Virtual SAN disk groups, and in particular I described the recommended operating procedures for managing Virtual SAN disk groups.

In this article, I will take a similar approach and cover the recommended operating procedures for replacing flash and magnetic disk devices. In Virtual SAN, drives can be replaced for two reasons; failures, and upgrades. Regardless of the reason whenever a disk device needs to be replaced, it is important to follow the correct decommissioning procedures.

Replacing a Failed Flash Device

The failure of flash device renders an entire disk group inaccessible (i.e. in the “Degraded” state) to the cluster along with its data and storage capacity.  One important observation to highlight here is that a single flash device failure doesn’t necessarily mean that the running virtual machines will incur outages. As long as the virtual machines are configured with a VM Storage Policy with “Number of Failures to Tolerate” greater than zero, the virtual machine objects and components will be accessible.  If there is available storage capacity within the cluster, then in a matter of seconds the data resynchronization operation is triggered. The time for this operation depends on the amount of data that needs to be resynchronized.

When a flash device failure occurs, before physically removing the device from a host, you must decommission the device from Virtual SAN. The decommission process performs a number of operations in order to discard disk group memberships, deletes partitions and remove stale data from all disks. Follow either of the disk device decommission procedure defined below.

Flash Device Decommission Procedure from the vSphere Web Client

  1. Log on to the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the cluster object
  3. Go to the manage tab and select Disk management under the Virtual SAN section
  4. Select the disk group with the failed flash device
  5. Select the failed flash device and click the delete button

Note: In the event the disk claim rule settings in Virtual SAN is set to automatic the disk delete option won’t be available in the UI. Change the disk claim rule to “Manual” in order to have access to the disk delete option.

Flash Device Decommission Procedure from the CLI (ESXCLI) (Pass-through Mode)

  1. Log on to the host with the failed flash device via SSH
  2. Identify the device ID of failed flash device
    • esxcli vsan storage list

SSD-UUID

  1. delete the failed flash device from the disk group
    • esxclivsan storage remove -s

SSD-UUID-CLI

Note: Deleting a failed flash device will result in the removal of the disk group and all of it’s members.

  1. Remove the failed flash device from the host
  2. Add a new flash device to host and wait for the vSphere hypervisor to detect it or perform and device rescan.

Note: This step is only applicable when storage controllers are configured in pass-though mode and support hardware hot-plug feature.

Upgrading a Flash Device

Before upgrading the flash device, you should ensure there is enough storage capacity available within the cluster to accommodate all of the currently stored data in the disk group, because you will need to migrate data off that disk group.

To migrate the data before decommissioning the device, place the host in maintenance mode and choose the suitable data migration option for the environment. Once all the data is migrated from the disk group, follow the flash device decommission procedures before removing the drive from the host.

Replacing a Failed Magnetic Disk Devices

Each magnetic disk is accountable for the storage capacity it contributes to a disk group and the overall Virtual SAN datastore. Similar to flash, magnetic disk devices can be replaced for failures or upgrade reasons. The impact imposed by a failure of a magnetic disk is smaller when compared to the impact presented by the failure of a flash device. The virtual machines remain online and operational for the same reasons described above in the flash device failure section.  The resynchronization operation is significantly less intensive than a flash device failure. However, again the time depends on the amount of data to be resynchronized.

As with flash devices, before removing a failed magnetic device from a host, decommission the device from Virtual SAN first. The action allows Virtual SAN to perform the required disk group and devices maintenance operations as well as allow the subsystem components to update the cluster capacity and configuration settings.

vSphere Web Client Procedure (Pass-through Mode)

  1. Login to the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster
  3. Go to the manage tab and select Disk management under the Virtual SAN section
  4. Select the disk group with the failed magnetic device
  5. Select the failed magnetic device and click the delete button

Note: It is possible to perform decommissioning operations from ESXCLI in batch mode if required. The use of the ESXCLI does introduces a level of complexity that should be avoided unless thoroughly understood. It is recommended to perform these types of operations using the vSphere Web Client until enough familiarity is gained with them.

Magnetic Device Decommission Procedure from the CLI (ESXCLI) (Pass-through Mode)

  1. Login to the host with the failed flash device via SSH
  2. Identify the device ID of failed magnetic device
    • esxcli vsan storage list

mag-change

3. delete the magnetic device from the disk group

esxcli vsan storage remove -d
HDD-UUID-CLI
4.  Add a new magnetic device to the host and wait for the vSphere hypervisor to detect it or perform and device rescan.

Upgrading a Magnetic Disk Device

Before upgrading any of the magnetic devices ensure there is enough usable storage capacity available within the cluster to accommodate the data from the device that is being upgraded. The data migration can can be initiated by placing the host in maintenance mode and choosing a suitable data migration option for the environment. Once all the data is offloaded from the disks, proceed with the magnetic disk device decommission procedures.

In this particular scenario, it is imperative to first decommission the magnetic disk device before physically removing from the host. If the disk is removed from the host without performing the decommissioning procedure, data that is cached from that disk will end up being permanently stored in the cache layer. This could reduce the available amount of cache and eventually impact the performance of the system.

Note: The disk device replacement procedures discussed in this article are entirely based on storage controllers configured in pass-through mode. In the event the storage controllers are configured in a RAID0 mode, follow the manufactures instructions for adding and removing disk devices.

– Enjoy

For future updates on Virtual SAN (VSAN), Virtual Volumes (VVols), and other Software-defined Storage technologies as well as vSphere + OpenStack be sure to follow me on Twitter: @PunchingClouds

VSAN-Ops-LogoThis new blog series will focus on Virtual SAN day-to-day operations related tasks and their recommended operating procedures. I will start the series by covering one of the key and most important aspects of Virtual SAN, which is the management of disk groups.

Managing Disk Groups

Disk groups are logical management constructs designed to aggregate and manage locally attached flash devices and magnetic disks on ESXi hosts. When disk groups are created the flash devices are utilized to create a performance (caching) layer, while magnetic disks are utilized to create the persistent storage layer and provide storage capacity.

Creating Disk Groups

Disk groups are individually created on every host that is a member of a Virtual SAN enabled cluster. Creating a disk group requires the existence of a single flash device and a single magnetic disk at the very least. A disk group supports a maximum of one flash device, and up to seven magnetic disks.

Disk groups can be created through the vSphere Web Client as well as the command line interface utilities such as esxcli after the Virtual SAN feature has been enabled in a cluster. The vSphere Web Client presents the simplest method for small environments, while command line utilities such as esxcli can provide automation capabilities for large environments.

The recommended procedures for creating disks groups are described below.

Creating disk groups from the vSphere Web Client 

  1. Log into the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster
  3. Go to the manage tab and select Disk management under the Virtual SAN section
  4. Click the Claim Disks icon
  5. Select a flash device and as many magnetic disks needed
  6. Click Ok

Creating disk groups from the CLI (ESXCLI)

  1. Login to a host via SSH
  2. Identify the device IDs of all disks that will be used to create the disk group
    esxcli core storage device listesxcli-1
  3. Add a flash device and one or more magnetic disks to create the disk group
    esxcli vsan storage add -s -d
    esx2

Note: All of the flash devices and magnetic disks must not contain any existing logical partitions. In the event, they do; all partitions must be deleted before Virtual SAN can utilize them.  For instructions on how to delete partitions, please see http://blogs.vmware.com/vsphere/2014/05/virtual-san-troubleshooting-automatic-add-disk-storage-mode-fails-part-1.html.

Deleting Disk Groups

Disk groups may be deleted for a couple reasons ranging from hardware device decommission, device failure, as well as device upgrades. It is important to point out that deleting a disk group not only deletes that logical construct, but it also permanently deletes the membership between the disks as well as all of their stored content.

Disk groups can be deleted through the vSphere Web Client as well as the command line interface utilities such as esxcli. The vSphere Web Client presents the simplest method for small environments, while command line utilities such as esxcli can provide automation capabilities for large environments.

The recommended procedures for deleting disks groups are described below.

Deleting disk groups from the vSphere Web Client

  1. Log into the vSphere Web Client
  2. Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster
  3. Go to the manage tab and select General under the Virtual SAN section
  4. Click on the edit button on the Virtual SAN is Turned On filed
  5. Change the Add disk to Storage setting from Automatic to Manual
  6. Select Disk management under the Virtual SAN section
  7. Select the desired disk group to delete
  8. Click the remove disk group icon to delete the disk group
  9. Click Yes on the disk group pop window to the delete disk group

Note: While it is not a requirement, consider placing the hosts in maintenance mode before deleting a disk group. Placing a host in maintenance mode provides an opportunity to migrate data if necessary. 

Deleting Disk Groups procedure from the CLI (RVC)

  1. Log into vCenter via RVC
  2. From the RVC command line navigate to the datacenter tree structure
    cd vcenter_server/datacenter
    remove-rvc
  3. Use the vsan.host_wipe_vsan_disks command to delete the disks on a particular host
    vsan.host_wipe_vsan_disks ~/computers/cluster/hosts/hostname-or-ip/ -f
    force-rvc
  4. A message output similar to the one illustrated below validates the successful deletion of the disk group.
    success-remove

While it is not a requirement, consider placing the hosts into maintenance mode before deleting a disk group. Placing a host in maintenance mode provides the opportunity to migrate data if necessary.

In the next post I will cover the recommended procedures for replacing failed devices.

– Enjoy

For future updates on Virtual SAN (VSAN), Virtual Volumes (VVols), and other Software-defined Storage technologies as well as vSphere + OpenStack be sure to follow me on Twitter: @PunchingClouds

Virtual SAN ObserverI recently published an article describing the procedure to configure and enable the VMware Virtual SAN Observer tool to work without the requirement of Internet access. This operating mode is known as the offline-mode.

VMware Virtual SAN Observer Offline Mode article

The Virtual SAN Observer tool provides three different options for monitoring statistics. The offline monitoring option is probably the most utilized option out of the three as it presents the most flexibility for archiving and data transportability. A brief description of the Virtual SAN Observer monitoring modes is listed below.

Virtual SAN Observer Monitoring Modes

  • Live Monitoring – This mode displays the performance statistics in real time as they are being generated by the system.
  • Offline Monitoring – This mode provides the ability to create a tar.gz package with html bundle which can be utilized for archiving purposes or future inspection.
  • Full raw status bundle – This mode provides the ability to collect all the stats into a large JSON file for deeper analysis.

Read Full Article →

Virtual SAN Observer

The VMware Virtual SAN Observer is currently the best monitoring and troubleshooting tool for Virtual SAN that is available today. The tool is utilized for monitoring performance statistics for Virtual SAN live or offline.The Virtual SAN Observer UI depends on a couple of JavaScript and CSS libraries (JQuery, d3, angular, bootstrap, font-awesome) in order successfully display the performance statistics and their information.

These library files are access and loaded during runtime when the Virtual SAN Observer page is rendered. The tool requires access to the libraries mentioned above in order to work correctly. This means that the vCenter Server requires access to the internet. This requirement can potentially present a challenge in secured environments where applications with access to the internet is not be a practical form of operation and it’s not allowed.

Many vSphere admins have encountered this issue. In particular, those supporting secured environments. In order to overcome this issue, the vCenter Server Appliance can be modified so that it can access the required files and library locally.

Note: it is a recommended practice to always deploy an out of band vCenter Server Appliance for the purpose of using Virtual SAN Observer.

In order to configure the Virtual SAN Observer to work without internet connectivity (offline mode) the files listed below need to be modified. The html files are located under the vCenter Server appliance “/opt/vmware/rvc/lib/rvc/observer/” directory. Read Full Article →