PowerCLI: Unfreeze/Resume Your Virtual Machines

NOTE: I published this article a bit earlier, but due to the fact that this was a not-yet-released feature I removed this post once again. Since this feature is released for a long time already I deciced to re-publish it, enjoy!

Last night I got a message from our monitoring system that one of our NFS-volumes on our NetApp SAN was almost full but at this particular moment I was driving home and by the time I got home the NFS volume was completely full which resulted in the Virtual Machines freezing up.

After sizing up the NFS volume the machines do not automatically start to ‘unfreeze’ again, and they were waiting for me to answer the question in the vSphere client to “Retry” or “Abort”. Ofcourse I could’ve just answered “Retry” on each machine to get them going again, but I thought it would be nicer to solve this with the almighty PowerCLI since it were like 30 Virtual Machines.

Get-Cluster "Your Cluster" | Get-VM | Sort | Get-VMQuestion | `
Set-VMQuestion -Option "Retry" -Confirm:$false

This will result in answering all machines waiting to continue with the “Retry” option in your cluster (in this example I used “Your Cluster” as clustername).

I hope you won’t be needing it much, but if you do it will be handy ;)

VMware vCenter Server Heartbeat

Introduction:

First of all I would like to say I’ve been very busy lately. Therefore I haven’t had much time to spend on my blog. Too bad, cause I really enjoy publishing articles like this. Anyway, in the past few months I’ve done several attempts to convince my manager to acquire a license for vCenter Server Heartbeat in which in eventually succeeded.

Since the implementation of our vSphere Infrastructure our environment has grown enormously and there are just about 15 datacenters connected to our vCenter Server. Which really makes your ‘single node’ vCenter Server really a single point of failure.

Why use vCenter Heartbeat?

There are several reasons why an organization should consider going to use a Heartbeat setup for their VMware vSphere Infrastructure:

  • Continuity of your Clusters (DRS)
    Without your vCenter Server your Clusters won’t be able to function
  • Manageability
    It’s really a must with several datacenters in your vCenter Server with their own administrators to be able to manage their virtual infrastructure
  • Backup
    In case your backup software uses your vCenter to index your virtual infrastructure and use it in it’s backup process it’s essential your vCenter Server is reachable.

Installation Methods

There are several methods of deployment of your Heartbeat setup: V2V (Virtual to Virtual), V2P (Virtual to Physical) or P2P (Physical to Physical).

In our case we chose for the P2P (Physical to Physical) solution because I really insisted the secondary server was just as powerful as the primary in case a failover situation would last longer than just a few hours.

vCenter Server Hardware

VMware insists on –when using the Physical to Physical solution- both of your vCenter Servers (Primary and Secondary) have similar hardware. In our case our vCenter Servers consist of the following:

Brand/model: HP ProLiant DL360 G6
Processors: 2x Intel Xeon E5540 2.53GHz
Memory: 12G DDR3 Registered (6GB per CPU)
Disk(s): 2x HP 72GB 15K Dual Port (RAID-1)
My design for vCenter Server Heartbeat

Design for vCenter Server HeartBeat

In our infrastructure we have several datacenters spread all of the the country connected to each other by a fiber (WAN) network. Two locations (sites) are really close to each other and are also connected by a separate fiber connection.

The ideal situation of this nearby location is that it’s using the same physical network as our site does and most of the VLAN’s designated for servers are available on both sites. So this location/site is ideal as a backup-location.

When installing vCenter Heartbeat you are able to select if you are installing on a WAN of a LAN infrastructure. Since this is is really a LAN-situation we choose the LAN-setup. Primarily because we are able to use a single Public IP address for our vCenter Server which simplifies management.

For this setup I’m using our default Server VLAN to put our Public IP address in and I’ve created a separate VLAN for the Channel (heartbeat/synchronization) communication of vCenter Heartbeat.

I’ve simplified our infrastructure in the image below so it’s main and only focus is on our vCenter Server Heartbeat setup. In this case the public IP Address is 10.15.1.17 and is in a different subnet/VLAN than our Heartbeat Channel. The primary vCenter Server has 10.15.210.11 and the secondary has 10.15.210.12.

The channel is used by Heartbeat to synchronize the registry, filesystem of the Heartbeat nodes and to communicate from the primary to the secondary. This channel is also used to check if the other ‘node’ is still alive and to eventually initiate a failover if it’s not.

Packet Filter (Neverfail)

As you might have seen in the design above is both the primary and the secondary server have the same name and public IP address. There is a really simple explanation for this: when you choose to install Heartbeat with a LAN-setup it will assume both the vCenters will have the same IP address and name. This is because during the setup on the primary node the setup will create a System State backup which you will restore on the secondary  node afterwards. From that moment on the secondary node is equal to the primary node in every way.

To prevent both the primary and the secondary server from having their public adapters active on the network VMware has implemented a “Neverfail Packet Filter Driver” which will be installed on the Public network adapters during the installation of Heartbeat.
The idea of the Packet Filter is really simple: Heartbeat will disable the Packet Filter on the node that is currently active. During a failover the Heartbeat software will enable the Packet Filter on the node that will be inactive and disable it on the node that will be the active node from that moment on.

vCenter Databases

There are several ways to host your vCenter databases and in our infrastructure we chose to host our databases on a seperate dedicated database server. This database server is currently running Microsoft SQL Server 2005 Standard and we are going to migrate the databases to a Microsoft SQL Server 2008 Cluster soon for the simple reason that if you make your front-end redundant it’s seems just as logic to do the same for your back-end.

Installing/implementing vCenter Server Heartbeat

Luckily VMware really thought of this: yes, it is very well possible to implement Heartbeat on your running environment. If this wasn’t possible it really would be an hassle to implement it.

The documentation of Heartbeat comes in two documents: quick setup and the reference guide. In my opinion these documents contain most of the steps you need to take to install Heartbeat so there is no need for me to describe each step. Although in my opinion the Quick guide really misses out on some detail so I would suggest the Reference Guide.

In the global steps below to take to implement Heartbeat I will indicate which action is required on what server by starting the line with “Primary”, “Secondary” or even “Both”:

  • Secondary: make sure the hardware (CPU, Memory, Disks) is similar to your primary server
  • Secondary: install the exact same operating system as you have on your primary server. In our case this is Windows Server 2008 x64 Standard. Give it a temporary IP (DHCP if you will) and a bogus hostname. This will be overwritten when Heartbeat sets up your secondary node with your primary node’s data.
  • Both: also make sure you have your ILO configuration on both the primary as the secondary server to be able to reach the servers in some of the steps of the installation
  • Both: make sure the Windows Update level on both servers is the same.
  • Both: in case you are using Windows Server 2008 you need to install some features on the server before you can start the installation: Backup, Backup-Features and Backup-Tools.
  • Both: Very important, installing Heartbeat will NOT work properly when you have NIC Teaming enabled. If you want to use NIC Teaming: set it up when the whole Heartbeat setup is finished!
  • Primary: install Heartbeat and follow the steps on the screen, make sure you have some storage space on the network available to store the temporary backup-files to transfer the backup which contains the identity of your primary server to your secondary.Important: make sure your secondary server is able to reach this location.
  • Secondary: as you probably have followed all of the steps on your primary node the setup will tell you to continue your setup on your Secondary node.
  • Secondary: don’t forget to manually change your Channel IP to the proper IP you have reserved for the secondary node after the Heartbeat setup rebooted your server!
  • Both: this is for both servers, but has to be configured only once in your Heartbeat configuration. You will need to provide a service account to the vCenter Service Plugin of Heartbeat that has enough access on your vCenter Server to monitor if it’s down or up.I would suggest if you are running vCenter Server on a service account like I do you use that account for this purpose.

Final words

Some of you are eager to try this baby out. Well that’s possible: you are able to use vCenter Server Heartbeat in trial for 60 days. The only problem is: you need to be able to download it from the VMware download site. I’m pretty sure that if you contact customer service they will provide you with a link to do so. The great thing about running the trial is that if you don’t like it or your trial expires without you inserting a permanent license you still able to uninstall it in a nice and easy way and continue with your single vCenter Server.

As I am really curious about your experiences with vCenter Heartbeat: please reply or comment to this post to share your installation experiences/issues.

PowerCLI: Migrate VM’s to another VLAN/Portgroup

Scenario:

Suppose you have several Virtual Machines running in a VLAN of which you decide they should be migrated to a new VLAN because of infrastructural changes in your network. Mind you this is only handy if the machines you are about to migrate use DHCP to get their network addresses.

Approach:

First of all you need to make sure all the ESX-servers in your cluster have the ability to use that VLAN so you need to have your your switchports/trunks tagged with it.

Secondly you need to add the new VLAN (portgroup) to the complete cluster, in this example I’m using vSwitch0 as your Virtual Switch name, make sure you change it into what applies to you in case if it’s different.

Get-Cluster "Your Cluster" | Get-VMHost | Get-VirtualSwitch -Name "vSwitch0"`
| New-VirtualPortGroup -Name "New VLAN" -VLanId 123

This will add the “New VLAN” to all of the ESX Servers in “Your Cluster” with VLAN ID 123 on vSwitch0.

Since all the ESX Server are provided with access to the new VLAN to place their Virtual Machines in it’s time to migrate them to the new VLAN. There are two ways you can do this:

The first is to configure all the network adapters of all VM’s from the old to the new VLAN/Portgroup and the second is to do exactly the same but to shortly disconnect/reconnect the network adapters of the Virtual Machines to initiate a DHCP request for your VM’s.

First method:
Get-Cluster "Your Cluster" | Get-VM | Get-NetworkAdapter | `
Where { $_.NetworkName -eq "Old VLAN" } | `
Set-NetworkAdapter -NetworkName "New VLAN"
Second method:
Get-Cluster "Your Cluster" | Get-VM | Get-NetworkAdapter | `
Where { $_.NetworkName -eq "Old VLAN" } | `
Set-NetworkAdapter -NetworkName "New VLAN" -Connected:$false | `
Set-NetworkAdapter -Connected:$True
Note:

Then again if you do use static addresses for your Virtual Machines you will need to configure them within the Operating Systems afterwards. Although that might be scriptable I haven’t found a way to do this.

Vizioncore Releases vRanger v4.1 DPP

Vizioncore finally released the new version of vRanger. The reason why I’m really happy with that is that the issue that vRanger 4.0 DPP crashed Virtual Center at startup has been resolved!

Everyone who had or still has this problem is advised to upgrade their vRanger versions to have a proper working environment and to stop your worries about if your Virtual Center would crash at startup.

The new version is 4.1 DPP and can be found on the Vizioncore Download Site.

Other fixes, changes and new features can be found in the Release Notes of vRanger.

How to: Install FastSCP 3.0.1.266 on Windows 7 (x86/x64)

I was planning on installing FastSCP on my Windows 7 workstation so I started up the installation file (veeam_fastscp_3.0.1.266.exe) till this error message suddenly showed up:

image

Too bad, and when you press OK in this screen it stops the installation.

No worries, I found a workaround to install this one relatively easy (supported or unsupported).

When you start the installation file once again you see the following windows preceding the error message:

image

This message actually shows the original installation executable extracts its installation files to a temporary directory.

So when the error-message pops op again, just don’t press OK and leave it there for a moment.

In the meantime you open your Windows Explorer and you open the following folder: “C:\Users\YourUserName\AppData\Local\Temp\” and you look for a directory that is named somewhat like “IXP000.TMP”.

image

In this directory you will find the following files from which you will start the FastSCPSetup executable to initiate the installation and install FastSCP without problems.

NOTE: In case of missing or not properly working functionality within FastSCP after installing you will still have to wait for a proper Windows 7 version. Remember: this is a -somewhat- dirty workaround. It works for me, but it is not supported and  might as well not work for you.

VMware: Bad performance on HP ProLiant DL380 G6 with ESXi 3.5 U4

At first I was really happy with my new HP ProLiant DL380 G6 because it is a really nice machine.  This machine was going to be an ESX-server so I added a few disks, a second CPU and plenty of memory, new specifications:

  • CPU(s): 2x Intel Xeon Quad E5520 (2.27GHz)
  • Memory: 28GB (2x 2GB, 6x 4GB) DDR3
  • Disk(s): 2x 36GB, 3x 300GB

Enough to make the machine running really fast, you would think. Once I had installed ESXi 3.5 U4 on the machine I started to install the environment consisting of a 4 Windows-servers and 5 client-systems (XP and Vista).

While installing I noticed that the machine appeared to be really slow when it comes to Disk I/O: During the installation of two 2008-Servers the extraction process of the disk image took over an hour (I installed both 2008 servers simultaneously, which normally would be no problem at all). When finished installing I saw the “Disk Performance” statistics (in the vSphere Client) peaking no higher than 7-8MB/s which was really frustrating. The esx-top utility on the command-line of the ESX-server gave the exact same results.

After some searching Arne Fokkema sent me a link where some people had similar problems on the VMware Communities and there was one response with someone who solved it by upgrading the firmware of the DL380’s Smart Array P410i Controller to version 2.00, and also by adding the Battery Pack for Smart Array to enable the Posted-Write Cache.

So here are the steps to take to solve this issue:

  • Order the Battery Pack for Smart Array P212/P410/P411 (P/N: 462969-B21). Since ordering this can take some time it’s the smartest thing to do first ;)
  • Upgrade your Smart Array P410i Controller’s firmware to version 2.00. Which is not really easy of you’re not used to doing such. You can do this by:
    • Downloading the latest HP ProLiant Firmware CD – which is version 8.60 at this moment and can be found here: http://bit.ly/1Luqts
    • Download the Smart Array P410i v2.00 Firmware (both the CP011340.scexe and CP011340.md5 file) which can be found here: http://bit.ly/mDiqE
    • Open the Firmware 8.60 ISO-file with your ISO-editor (PowerISO for example) and add the two files of the Smart Array firmware update and add them to the compaq\swpackages directory and save the ISO-file.
    • Burn or mount the ISO through ILO on your server and boot from it and make it run all the Firmware-updates and it will automatically say it wants to update the Smart Array controller firmware to version 2.00, and this is what you need to do
  • Furthermore, after receiving your Batttery Pack order, place it in the server and connect it to the memory of the Smart Array controller. If you don’t know how: consult the ProLiant DL380 G6 Manuals.

NOTE: The usage of the ‘altered’ HP ProLiant Firmware CD is necessary because normally you can run Firmware update like these from the server’s commandline or within Windows using the HP Smart Update tool. But ESXi 3.5 U4 lacks some commandline tools to do this.

After doing all of this you will still have to wait a few hours for the Battery Pack is fully charged, because the Posted-Write Cache will not work until then. After this your server will run on steroids and will reach up-to and above 70MB/s.

I hope I have saved you all lots of trouble finding the cause of the lack of disk performance on your brand-new server ;)

Vizioncore vRanger 4.0 Pro Crashes vCenter 2.5 Update 3, 4 and 5

Update (09/25/2009):

Vizioncore’s new version of vRanger (4.1.0 build 11581) has solved this issue. If you had experienced these issues you are advised to upgrade as soon as Vizioncore has released this new version officially!

Update (08/28/2009):
Vizioncore announced that the new version where I suppose this problem is fixed in will be released Mid-September.

Original Twitter-message:

Vizioncore vRanger Pro 4.1 DPP to be released in Mid-Sept. More info @ VMworld 2009 and VirtualVizion http://tinyurl.com/vc-vvizion

Since not too long I have been having problems with a crashing Virtual Center. The version where the problems started with was 2.5 Update 4. The problem was caused by vRanger Pro 4.0 DPP. Please keep on reading to see the complete story on this problem.

Problem occurrence and symptoms:

In short this is how the problems first occurred: our DBA installed SP3 on our SQL 2005 Server and rebooted our SQL Server afterwards which normally wouldn’t be much of a problem but in this case our Virtual Center wouldn’t start anymore. The service starts and crashes after 5 seconds. The vpxd.log shows an Win32_Exception and some debug data and that’s it.

Attempts to solve the problem:

At first the problems appeared to have been caused by the update on the SQL Database Server, so these are the steps I’ve taken in my attempts to solve the problem.

  • The things we’ve tried to solve this mysterious problem: Reinstalled VC against our production database: same problem
  • Reinstalled VC against our production database on another SQL Sever
    with SQL Server 2005 SP2 on it: same problem
  • Reinstalled VC with a clean database and reconfigured our entire
    environment, but after a reboot of the VC: same problem except this
    time we installed VC 2.5 Update 5 instead of Update 4.
  • Cleanly installed a new server with a fresh Windows 2003 install and
    all recent updates and installed VC 2.5 Update 5 again with a clean
    database but the same problem occurred after rebooting the VC server

We repeated the last step probably about 2 times and it was really making me desperate for a solution because we have a pretty big environment.

The actual problem:

After desperately creating a Support Request at VMware they asked me to disable vRanger if we had that running in our environment because it was known to cause problems to Virtual Center.

The thing that already raised questions on my behalf before this incident was that the new vRanger constantly kept an open connection to Virtual Center, and as soon as you killed that session from Virtual Center the vRanger service immediately reconnected. So it appeared to me that during the startup of the Virtual Center service it constantly tried to connect and initiate something or tried to retrieve data from it which caused the Virtual Center service to crash right after startup.

Resolution:

After disabling the vRanger service –which wasn’t installed on our Virtual Center server itself by the way- we held our breath and restarted our Virtual Center server, and surprisingly the service started flawlessly.
To make sure it really was vRanger we re-enabled the vRanger service and restarted the Virtual Center service and it crashed immediately.

After disabling the vRanger service VC functioned flawlessly again.

Final note:

The road to the solution has taken around two weeks. It was horrible. Finally the solution is known. By the way, previous versions of vRanger do NOT cause these problems!

Update (helpful reaction from Vizioncore, see comments below for original message):

We are working very diligently to get this fix in the hands of our users, we have a fix that already out of our development process and is now in the hands of our QA team. You can also reference the following Knowledgebase article here http://www.vizioncore.com/support/knowledgebase/index.php and search for KB 00000296 about this issue. This KB will also be updated shortly for a root cause once the issue is fully cleared by development and QA, for those that are interested in the technical details about the issue. Thanks again for everyone’s support! And we hope to get everyone taken care of very soon!

PowerCLI: Reset CD-drives using PowerShell

As most of you know and probably experienced from time to time: when a Virtual Machine’s CD-drive is connected to an ISO-file on one of your Datastores or even connected to the physical drive of your ESX-host the migration due VMotion of a Virtual Machine will not work.

Normally this isn’t really a problem except if you put a ESX-host in Maintenance mode and Virtual Center will simply not tell you why the Maintenance mode process is hanging or even giving a time-out after 15 minutes for no obvious reason. Most of the times it’s a Virtual Machine which has a CD-drive connected to an ISO file. A waste of time if you ask me.

So to prevent this from happening I’ve written a simple PowerShell oneliner/script disconnect these CD-drives from the ISO-files or from the physical drives and set them to Client-drives which is ok for VMotion:

(Get-VM -Location:(Get-VMHost "your.esx.host")) | `
ForEach ( $_ ) { Get-CDDrive $_ | `
Where { $_.IsoPath.Length -gt 0 -OR $_.HostDevice.Length -gt 0 } | `
Set-CDDrive -NoMedia -Confirm:$False }

Instead of executing this just on one host you can also execute this for your entire cluster:

(Get-VM -Location:(Get-Cluster "Your Cluster Name")) | `
ForEach ( $_ ) { Get-CDDrive $_ | `
Where { $_.IsoPath.Length -gt 0 -OR $_.HostDevice.Length -gt 0 } | `
Set-CDDrive -NoMedia -Confirm:$False }

Or ofcourse, by Datacenter:

(Get-VM -Location:(Get-Datacenter "Your Datacenter Name")) | `
ForEach ( $_ ) { Get-CDDrive $_ | `
Where { $_.IsoPath.Length -gt 0 -OR $_.HostDevice.Length -gt 0 } | `
Set-CDDrive -NoMedia -Confirm:$False }

It’s as easy as that, now there will be no Virtual Machine interrupting your VMotions anymore and you can put your ESX hosts in maintenance mode without any problems ;)

Cheers!

Twitter suffering from DDoS-Attack

Just in case you haven’t noticed or heard yet: Twitter has been down for an hour and a half now and as the Twitter HQ was looking for a reason why it was down they discovered that they were being attacked by a DoS attack:

The status message can be read on the Twitter Status Site. I’ve added a little screenshot of the message from the Twitter HQ that can be read there:

image

I hope –whoever they are- that they’re stopping the attack soon. It seems pretty pointless to me.

Update:
(9:46a): As we recover, users will experience some longer load times and slowness. This includes timeouts to API clients. We’re working to get back to 100% as quickly as we can.

Quick and simple VMware ESX Host Statistics

Just a small oneliner to display all the servers, their overall status, CPU and Memory usage in all your Datacenters (can be handy if you have multiple datacenters).

Get-Datacenter | Sort | Get-VMHost | Sort | Get-View |`
Select Name, OverallStatus, `
@{N="CPU Usage (GHz)";E={[math]::round(
$_.Summary.QuickStats.OverallCpuUsage/1024,2)}}, `
@{N="Memory Usage (GB)";E={[math]::round(
$_.Summary.QuickStats.OverallMemoryUsage/1024,2)}} 

And it will give you an output that looks like this:

image

You may not find it very useful like this, but you can also add a Where statement to this line to filter on several things. For example, you can decide you only want to see the servers that have yellow or red Overall Status due to high memory or CPU usage:

Get-Datacenter | Sort | Get-VMHost | Sort | Get-View | `
Select Name, OverallStatus, `
@{N="CPU Usage (GHz)";E={[math]::round(
$_.Summary.QuickStats.OverallCpuUsage/1024,2)}}, `
@{N="Memory Usage (GB)";E={[math]::round(
$_.Summary.QuickStats.OverallMemoryUsage/1024,2)}} | `
Where { $_.OverallStatus -ne "green" }

Which will give you something like this:

image

Ofcourse these onliners are cool and handy to use, but you can also use these oneliners to write a script around it to monitor your servers. I will post a script like that soon to show you different interpretations of this script.