Cisco UCS and Auto Deploy DHCP Issues
So it’s been a while since I’ve been able to post something in the clouds, I’ve been very busy and need to find the time to update and move PunchingClouds to a HOT new location, but I wanted to quickly put something out for those of you who used to or still come here from time to time. I recently ran into an issue with Auto Deploy, one of the new features of the vSphere 5 platform.
Auto Deploy seems to be picking up a great deal of interest out in the field and there have been a great deal of blog posts from Duncan Epping, (one half of the VMware Virtualization bad ass rock star! duo Frank and Duncan), Erick Sloof, Gabe’s Virtual World, and the rest of the community highlighting Auto Deploy’s configuration and capabilities. To be honest I think there is an obviously place in the market for Auto Deploy but there are a few things that need a bit work and fine tuning which I’m sure the folks at VMware will get to as they continue the push for cloud automation strategy. Overall Auto Deploy is a very nice and useful feature to have as long as there is a requirement in the infrastructure for it.
So recently while working on a plan and design of a brand spanking new vSphere 5 Architecture (Green Fields baby!!!!! how often do you get that?) I ran into an issue with Auto Deploy and the hardware that was to be used Cisco UCS. One of the key technical design decisions of this particular design was the implementation of stateless builds of ESXi 5 on Cisco UCS. The problem was not really a show stopper for us but we had to make sure the customer would approve the work around solution before we could proceed with it for production. This was more of matter of supportability from the customers perspective specifically than anything else.
So here is what happened and the discovery of the problem:
All of the Auto Deploy required components where deployed correctly in the required supported fashion (TFTP, DHCP, dedicated IP scope, etc), and just like a great majority of the companies in the world… most of their infrastructure services are based on Microsoft solutions (AD,DNS, DHCP, NAP, etc). So after configuring Auto Deploy and activating an initial boot image for the Cisco UCS blades, the blades were not able to PXE boot and connect to the Auto Deploy server (vCenter in this case), they were simply time out. After some research, troubleshooting and discussions with a colleague and lead architect on the project who I will refer to here as the Manchach we decided to take a different approach and try something new. The homeboy happened to be a very knowlegeable individual in the networking side of the house amongst many other things such as enterprise architectures, security, and other areas…enough of him. The whole point is that I listened and we decided to build a Linux based DHCP server and see what the outcome would be. We were able to identify that the Cisco UCS Blades were picking up their reserved IP address from the Linux based DHCP server but not the Microsoft DHCP server which effectively prove to be the reason for the Cisco UCS Blades failing to PXE boot and connect to the Auto Deploy server. Auto Deploy was key feature for the design of this architecture. I have experience in the past very few occasions where sometimes there are issues with Microsoft services interacting with other technologies that aren’t based on MS-DOS 6.22 : ) just kidding, but seriously we wanted to see if that behavior would repeat itself, and guess what… it didn’t. So here was the issue , the Cisco UCS Blades were not able to pick up and address from the Microsoft based DHCP service, but they were able to pick them up from the Linux based DHCP service (we used Ubuntu).
We proceeded with the creation of the new DHCP IP Scope and adding reservation for all hosts by mapping MAC addresses to IP’s and things worked fine with this approach. After some more research and digging we learned from Cisco that the issue was would be corrected in the new Cisco UCS version 2.0.
So here is the deal for those of you working with Cisco UCS version 1.4 be aware of this problem, if you’re considering the implementation of Auto Deploy for stateless builds and current DHCP solution is based on Microsoft DHCP service know that it won’t work unless you are using a Linux based DHCP solution or are on Cisco UCS version 2.0 (not confirmed by me yet). The work around is somewhat simple if the introduction of a Linux based DHCP server into the environment is possible.
We ended up upgrading the customer Cisco UCS 2.0 but have not been able to verify if the problem was rectified since the customer approved the deployment of the Linux based DHCP server in the management subnet.
I hope this is of use to someone. See you all on a black diamond near you!!!! More to come soon.Google+