The initial release of Auto Deploy in vSphere 5.0 was focused and limited to stateless provisioning use cases. Now, in vSphere 5.1 Auto Deploy has been updated to support Stateful and an enhanced version of Stateless provisioning called Stateless Caching.
Auto Deploy Stateless provisioning allows diskless hosts to automatically install and configure ESXi by downloading images via a PXE boot process. The installation’s unique configuration items are applied as part of the build process but it’s handled by Host Profiles. The ESXi image and it’s configuration are then retain in local RAM. This solution initially presented some areas of concerns in regards to the overall availability of the solution. The failure of any Auto Deploy dependent services such as TFTP, PXE, DHCP puts the solution at risks as neither new hosts or rebooted hosts will be able to download their respective images. While those are valid concerns, they can be address with the right architecture design and implementation that focuses on mitigating those risks. It’s safe to say, the initial release of Auto Deploy presented a few limitations, but the new release delivers much improves in those areas.
In vSphere 5.1 two new provisioning options have been added to Auto Deploy, Stateless Caching and Stateful Installs. Stateless Caching works much like the Stateless deployment option found in the previous and current version of vSphere, but it addresses the availability concerns with the new feature. Stateless Caching introduces the ability to save (cache) the ESXi image to an assigned storage device (Local disk, SAN, USB). Now, this option could have an impact with the overall cost and initial manageability of any environment if procuring storage media is required. All hosts boot process needs to be configured with a specific boot order, and with a minimum of 1 GB storage capacity on a supported storage media device. The boot order for Auto Deploy stateless cashing should be setup in the following order:
- Network boot device first
- Hard Drive or Removable Device (USB)
Stateless Caching Boot Order
Now how does this work? and How and when is the image cached?
The caching of the ESXi image takes place after the download and configuration are completed and running in local RAM, the image is then copied to the local storage device. This procedure acts as the fail back mechanism for hosts in the event any of the Auto Deploy dependent services fail or are unavailable for image downloads. The Auto Deploy Stateless Caching configuration has to be defined within the Host Profiles settings. Figure below illustrates the options available for Stateless Caching under the advanced configuration settings of a Host Profile.
Host Profiles Stateless Caching Setting
Under normal operations hosts will run in stateless mode, but in the event of a service or component failure the boot process will fail back to the boot image cashed to the storage device. The argument for first disk and the check to overwrite VMFS volumes on the selected disk capability options are only available if a storage device other than USB is selected. VMFS volumes cannot be created in USB storage media.
While Stateless Caching addresses some of the availability concerns surrounding Stateless provisioning, it’s important to understand that in scenarios where host have been configured with Stateless Caching it doesn’t guarantee that hosts will be fully functional and able to contact the vCenter Servers and other systems. One of primary benefit of Stateless Caching is the fact that hosts are brought online to facilitate troubleshooting and help resolve problems that prevent a successful PXE boot.
Currently Auto Deploy in vSphere 5.1 is only available via the C# client (thick client), I hope VMware releases for the new vSphere Web Client soon.Auto Deploy remains a PowerShell CLI centric tool in vSphere 5.1. A Technical Preview GUI plug-in is available for vSphere 5.0 but that version i snot currently compatible with vSphere 5.1 so you may need to keep your PowerCLI skills up to date until the GUI is available.
I’m a big fan of Auto Deploy as it’s extremely handy and now with even more relevance in the era of the Software Defined Datacenters.
So it’s been a while since I’ve been able to post something in the clouds, I’ve been very busy and need to find the time to update and move PunchingClouds to a HOT new location, but I wanted to quickly put something out for those of you who used to or still come here from time to time. I recently ran into an issue with Auto Deploy, one of the new features of the vSphere 5 platform.
Auto Deploy seems to be picking up a great deal of interest out in the field and there have been a great deal of blog posts from Duncan Epping, (one half of the VMware Virtualization bad ass rock star! duo Frank and Duncan), Erick Sloof, Gabe’s Virtual World, and the rest of the community highlighting Auto Deploy’s configuration and capabilities. To be honest I think there is an obviously place in the market for Auto Deploy but there are a few things that need a bit work and fine tuning which I’m sure the folks at VMware will get to as they continue the push for cloud automation strategy. Overall Auto Deploy is a very nice and useful feature to have as long as there is a requirement in the infrastructure for it.
So recently while working on a plan and design of a brand spanking new vSphere 5 Architecture (Green Fields baby!!!!! how often do you get that?) I ran into an issue with Auto Deploy and the hardware that was to be used Cisco UCS. One of the key technical design decisions of this particular design was the implementation of stateless builds of ESXi 5 on Cisco UCS. The problem was not really a show stopper for us but we had to make sure the customer would approve the work around solution before we could proceed with it for production. This was more of matter of supportability from the customers perspective specifically than anything else.
So here is what happened and the discovery of the problem:
All of the Auto Deploy required components where deployed correctly in the required supported fashion (TFTP, DHCP, dedicated IP scope, etc), and just like a great majority of the companies in the world… most of their infrastructure services are based on Microsoft solutions (AD,DNS, DHCP, NAP, etc). So after configuring Auto Deploy and activating an initial boot image for the Cisco UCS blades, the blades were not able to PXE boot and connect to the Auto Deploy server (vCenter in this case), they were simply time out. After some research, troubleshooting and discussions with a colleague and lead architect on the project who I will refer to here as the Manchach we decided to take a different approach and try something new. The homeboy happened to be a very knowlegeable individual in the networking side of the house amongst many other things such as enterprise architectures, security, and other areas…enough of him. The whole point is that I listened and we decided to build a Linux based DHCP server and see what the outcome would be. We were able to identify that the Cisco UCS Blades were picking up their reserved IP address from the Linux based DHCP server but not the Microsoft DHCP server which effectively prove to be the reason for the Cisco UCS Blades failing to PXE boot and connect to the Auto Deploy server. Auto Deploy was key feature for the design of this architecture. I have experience in the past very few occasions where sometimes there are issues with Microsoft services interacting with other technologies that aren’t based on MS-DOS 6.22 : ) just kidding, but seriously we wanted to see if that behavior would repeat itself, and guess what… it didn’t. So here was the issue , the Cisco UCS Blades were not able to pick up and address from the Microsoft based DHCP service, but they were able to pick them up from the Linux based DHCP service (we used Ubuntu).
We proceeded with the creation of the new DHCP IP Scope and adding reservation for all hosts by mapping MAC addresses to IP’s and things worked fine with this approach. After some more research and digging we learned from Cisco that the issue was would be corrected in the new Cisco UCS version 2.0.
So here is the deal for those of you working with Cisco UCS version 1.4 be aware of this problem, if you’re considering the implementation of Auto Deploy for stateless builds and current DHCP solution is based on Microsoft DHCP service know that it won’t work unless you are using a Linux based DHCP solution or are on Cisco UCS version 2.0 (not confirmed by me yet). The work around is somewhat simple if the introduction of a Linux based DHCP server into the environment is possible.
We ended up upgrading the customer Cisco UCS 2.0 but have not been able to verify if the problem was rectified since the customer approved the deployment of the Linux based DHCP server in the management subnet.
I hope this is of use to someone. See you all on a black diamond near you!!!! More to come soon.