How to troubleshoot RDS & XenApp CPU bottlenecks

It might be surprising, but even with today’s modern server hardware, super-fast CPU’s and all the great virtualization features, the most common bottleneck you see in RDS and XenApp environments is the good old CPU bottleneck.

By Guest Author - Tue, October 8, 2013 - 2 comments

If you want to write a guest post on 4sysops contact us.

This post was sponsored by Smart-X. Author: Yoni Avital, CTO Smart-X

Memory is quite cheap today and you can easily have 2 gigs of RAM for each user even when running 100 concurrent users on a single server (seeing that 196GB of RAM is a common configuration these days), I/O is usually not an issue in RDS/XenApp environments (especially since SSD drives became affordable and all the fuss about VDI storage issues raised the awareness on IOPS and their importance when virtualizing Windows desktops). So, we are left with the CPU bottlenecks and peaks, especially in environments were the end users can browse the Internet directly from the RDS/XenApp session.

One of the problems with a CPU bottleneck on a RDS server is that a 100% CPU peak immediately affects all users on the server and renders the server unresponsive. There are tools like AppSense Performance Manager and RES Workspace Manager that can optimize the CPU and prevent 100% CPU peaks and you should definitely consider these if you are facing constant CPU issues, however, the aim of this article is to show you how to troubleshoot a CPU peak and find its root cause.

Traditionally, most XenApp and RDS monitoring tools are able to identify when a server is crossing a CPU threshold and send an alert to the system admin for further investigation. Unfortunately, these tools don’t usually show which process or processes caused the high CPU peak and by the time the sys admin logs on to the RDS server and opens Task Manager, the offending process might no longer be seen. This is where ControlUp comes into the picture. ControlUp is a real-time management console specifically designed for RDS and XenApp environments, containing three important features that can help troubleshoot CPU peaks and find the root cause of the issue:

The first feature, ControlUp real-time performance view, allows you to quickly pinpoint the process / application screen that cause the high CPU usage. Consider the following example:

The sys admin receives an alert about high CPU usage on server CUXEN65TS14 (a XenApp 6.5 server). The admin then switches to the ControlUp Computers View and sees the following screen:

ControlUp Computers View

We can quickly see the server is currently utilizing 90% CPU. In order to investigate this issue, we can simply double click this server and drill down to the Sessions view:

Sessions View

Now we can see that user “Administrator” is consuming a lot of CPU cycles. Now let’s double click this session in order to drill down to the Processes View:

Processes View

Now we can see that iexplore.exe is the culprit. However, this information is usually insufficient. Internet Explorer processes can be used to launch internal web applications as well as browsing public Internet sites. To complete the investigation let’s right click on iexplore.exe and launch ControlUp’s “Get Session Screenshot” action to take a closer look of what’s actually happening inside the user’s session:

Inside the user’s session

This action will grab the end user screen in real-time and present it to the sys admin (based on your company privacy policy, you can configure this action to be user approved or completely disabled). Here is the “Get Session Screenshot” action result:

Get Session Screenshot - action result

So now we know the end user wanted to watch the Robocop 2014 official trailer, thus causing a major CPU peak on our RDS server.

We all know that Flash based web sites are CPU hogs in RDS environments and this is a great way to find exactly which sites are causing issues in your farm. The same exact logic applies to other applications, allowing you to correlate between specific application screens and CPU peaks.

We will now cover two additional ControlUp features that can help find the root cause of CPU issues in your environment. This time, we’ll focus on analyzing CPU peaks that occurred in the past and were captured by ControlUp when you weren’t looking at the real-time display.

The first feature, ControlUp Reporter, is a free utility that allows users to analyze the CSV log files exported from ControlUp using the “Scheduled Export” feature. Using the “Scheduled Export” feature, you can export every ControlUp view to a file folder based on a pre-configured interval:

Add Export Rule

Once your export rules are configured correctly, you can launch the ControlUp Reporter, point it to the relevant file folder and start creating historical reports. Let’s review 2 reports that can help analyze CPU issues:

ControlUp Reporter

The ‘CPU Usage by Computer’ report will produce a graph of the CPU usage over time:

CPU Usage by Computer

We can then use the ‘Computer Processes by Sample’ report to see the processes list in a certain point in time:

Computer Processes by Sample

As you can see, the CPU peak in this case was mainly caused by the HP InsightServer Agent. Interestingly enough, this is a virtual machine that was migrated using a P2V tool, and the customer simply forgot to remove the physical hardware agents from the virtual image… In this case, a simple uninstall of the unneeded software agent has saved the customer a substantial amount of precious CPU cycles.

The second feature, to be included in ControlUp 2.5, scheduled for beta release during Q4 2013, is called

ControlUp Follow-up Actions. This feature will allow the end user to configure follow-up actions after an alert threshold is breached. For example, you can configure an alert trigger whenever the CPU usage on a RDS server is higher than 90% for more than 60 seconds. Once this threshold is crossed, the new “Follow-up Action” feature can dump the relevant ControlUp views to disk:

Follow-up Action

ControlUp views are saved in CSV format and can opened using MS Excel software, this will allow you to do a post mortem analysis of the issue and find exactly which processes caused the CPU peak.

To summarize, CPU peaks are still quite common in RDS and XenApp (soon XenDesktop 7) environments. Due to the nature of most CPU issues, it is important to understand which Internet sites/ application screens are causing the CPU spikes and to be able to analyze these issues in real-time. In this article we have shown how ControlUp can assist in troubleshooting and ultimately preventing CPU peaks, thus leading to a better user experience.

You can learn more about ControlUp in our product pages and blog.

-1+1 - Rate this post
Loading ... Loading ...
Disclaimer
Your question wasn't answered? Please ask in the new 4sysops forum!

2 Comments- Leave a Reply

  1. Zen Render says:

    Yes! Really nice to see someone talking about Control-Up. I stumbled across it two years or more ago, and used it to keep 80 workstations and another 50 rendernodes running smoothly, with a vast array of no-nonsense, graceful, 3-click solutions to a tonne of tasks. Want to compare registries on 30 machines, and then fix the six that are out of line? 30 seconds, tops. Find missing fonts on half the farm? Another 30 seconds. And all with the barest minimum of setup and without permanent client install on the machines! This utility is a must-have for anyone running lots of sessions, whether they’re on a Citrix host or a baremetal system. Also free for connecting to fewer than 50 machines at a time – that’s unbeatable.

  2. Yoni Avital says:

    Zen – Thanks for great feedback!
    We are adding some great stuff to the upcoming version (ControlUp 2.5) like the ability to integrate existing PowerShell scripts as built-in ControlUp actions, stay tuned to our blog (http://www.smart-x.com/company/smart-blog/) for more info

Please share your thoughts in a comment!

Login

Lost your password?