Keeping all IT systems updated (patched) is not only a crucial part of a secure and operational environment but also a never-ending task for IT administrators. It is often quite a challenging and cumbersome process, especially in larger environments. The purpose of this article is to provide tips based on the experiences I have had in organizations of different sizes.

There are updates from hardware vendors, operating system vendors, antivirus vendors, and so on and so forth. Furthermore, there are line-of-business (LOB) application updates. Some are less important and can be applied once in a while; some are more important and should be applied on a regular (monthly) basis. Emergency patches should be applied out-of-band as soon as possible. If not managed properly, patching can not only become a real pain but can also have a significant business impact, for example, when untested updates break crucial applications.

Document the patch management process ^

First, you should have your own documentation for patch management. Each organization is a bit different and has its own procedures, policies, and needs. The documentation should include all steps performed in your patch release, testing scenarios, contact information for responsible persons, and rollback scenario.

Create and manage IT equipment inventory ^

You can't update something that you don't know about. I remember a case when WannaCry appeared, and all systems had to be patched immediately. Our colleagues thought they had done the job, but the infection was still there. In the end, we found out that there was a Windows XP computer inside a plastic injection (PLC) machine that nobody (from IT) knew about.

The inventory should include all servers, workstations, storage devices, routers, and so on. A simple Excel sheet might serve this purpose well.

Categorize by risk and priority ^

IT systems have different priorities and pose different risks. A server exposed to the Internet means a higher risk than a server located in a secure network. A server running a production application has higher priority than a file share. You should know which systems are more or less critical and focus on them accordingly.

In addition, patches have different priorities. A critical patch for the ESXi host has higher priority than a standard Windows Server cumulative update. An emergency security patch (let me use WannaCry as an example again) has even higher priority and should be applied out of band.

Define the patch release cycle ^

Ideally, patches should first be implemented in a non-production (test lab, development) environment to verify that they will not break anything. Cases where a patch may cause third-party application issues or system instability are endless. The non-production environment should be a mirror of your production environment.

The common patch release cycle for medium-to-large companies is:

  • Week 1 – Test and development environment
  • Week 2 – Pre-live environment
  • Week 3 – Production environment
  • Week 4 – Disaster recovery environment
Patch release cycle

Patch release cycle

Using such a cycle allows the system and application owners to evaluate possible impacts and solve any issues before applying the patches to production. Of course, if you don't have all these environments, you have to choose a different approach.

I know this might be tricky for smaller companies with few servers and desktops, but having a virtual machine or spare desktop where patches can be tested is recommended. It is always better to spend some time and resources on such tests rather than breaking your systems.

Test and evaluate system stability ^

After each release cycle described above, the applications should be properly tested. Some issues might not be visible at first glance. For example, a .NET-based application may show unstable behavior after .NET Framework update. The test process may be manual or automated using scripts and other tools.

In articles about this topic, you often find the claim that the security team has to test patch stability. Such a statement is far from reality, however. Companies that have their own dedicated IT security departments are, in most cases, very large companies with dozens of servers, systems, and applications. Their responsibilities are usually firewalls, antivirus, IPS systems, and so on. The security team usually informs system owners (server admins) about vulnerabilities. It is impossible for IT security team members to know each and every application in the company and thus be able to evaluate whether a patch has any impact on it.

It is the responsibility of the system and application owners to perform such tests. Each application should have its own test scenario based on its needs. Make sure you have this included in your documentation, at least for mission-critical applications.

Backups of production systems ^

Backing up any important data and systems is definitely a must. Some guides will tell you to create a full system backup before applying updates. This is also a bit misleading. Backups should be run regularly, and any issues should be handled when they occur. Also, backups are usually handled by different teams, and system owners usually don't even have any access to backup systems. In all the corporations I have worked in, our system team never cared about backups before running updates, as it was simply not our responsibility.

Of course, this is different if you manage everything on your own. Anyway, if you know that a Friday-night backup of your databases was successful, I see no reason to run another full (or even system) backup on Sunday before running the updates.

Configuration management ^

Any changes to the production environment should be properly documented. If you have a configuration management tool (like HPSM or ServiceNow), make sure to have a change ticket created for each round of updates. This can help you track any issues that may occur.

In the case of smaller companies without a ticketing system, include this information in your general IT equipment inventory document.

Roll out patches to production ^

Once you have tested and validated everything, you are ready to roll out the patches to the production systems. This is usually done outside of business hours (weekends) to prevent downtime and to have enough time to verify that everything went well.

If you have a virtualized environment, it might also make sense to create a virtual machine snapshot before applying the updates. This is extremely useful if you have systems that require a very quick rollback if the update fails. If you do that, don't forget the rule of thumb—snapshots should only exist for a short time in the production environment. Make sure you delete them.

Verify and report the update status ^

When updates are applied to hundreds of servers, usually using patching tools, some servers may fail to apply the update. In such cases, the server requires manual intervention like another reboot. It is a common habit to have a script that checks the server status (e.g., pending reboot).

Once verified, update your change ticket or IT equipment inventory with the results.

Subscribe to 4sysops newsletter!

Final words ^

As you can see in this article, applying updates can be quite complicated. I wrote this post based on my experience updating hundreds of Windows Server and VMware vSphere systems. As already mentioned, IT networks vary, and the patch management process might differ accordingly.