In the first part of this article, we looked at the benefits of S2D, its building blocks and how the technology behind the scenes achieves the necessary fault tolerance. In this post, we look at networking, monitoring, managing and setting up S2D clusters.

Networking

One of the big benefits S2D has over "traditional" Storage Spaces is the simpler networking. In 2012/2012 R2, you had two NICs dedicated to storage traffic between the SOFS and Hyper-V nodes as well as another network between the cluster nodes and another set of NICs for client and VM-to-VM traffic.

In a hyper-converged S2D solution, you can have only two NICs over which storage, cluster heartbeat and VM-to-VM/client data flows. Switch Embedded Teaming (SET) allows you to mix these networking loads together and use QoS or DCB to control bandwidth allocation. RDMA network interfaces are among the best, on average seeing a 30% increase in throughput over standard Ethernet while at the same time seeing an approximate 30% reduction in CPU usage. This last point is important because in a hyper-converged cluster, you want as many CPU cycles as possible to be available to run your VM workloads.

Setting up an S2D cluster

In my test lab, I have four physical computers running Windows Server 2016 Hyper-V, each with 32 GB of RAM. Two nodes have Chelsio 40 Gb RDMA NICs (2 x 40 Gb ports), and two nodes have Chelsio 10 Gb RDMA NICs (2 x 10 Gb ports). A 10 Gb spider cable connects the 40 Gb NICs to a Dell 12-port 10 Gb switch, providing a total bandwidth of 40 Gbps per host. The 10 Gb hosts have both ports connected, providing a total of 20 Gbps bandwidth per node.

I prefer to use iWarp RDMA networking over ROCE/Infiniband. It's easier because it avoids the complex DCB configuration of the switch. In addition, because it's just TCP/IP, you can use ordinary switches. This means you don't need all new infrastructure to implement iWarp, unlike ROCE. RDMA isn't limited to servers either. Microsoft now supports it in Windows 10, enabling scenarios where a high-speed trading application or video-editing station can connect to backend S2D storage over RDMA. As a side note, Chelsio is releasing a solution where their NICs can act as switches, thus eliminating the need for an expensive 10 or 40 Gbps switch altogether.

Each host has two SATA SSDs and two SATA 2 TB HDDs installed. I started by installing the OS on each node, followed by the latest driver for the Chelsio NIC. I ran the Test-Cluster cmdlet to see if the nodes met the requirements for a Failover Cluster.

Testing S2D cluster nodes before creating the cluster

Testing S2D cluster nodes before creating the cluster

I then used PowerShell to create a new cluster (starting with three nodes; I wanted to see how easy it was to add a fourth node later). To check availability of all drives for use, I ran Get-PhysicalDisk.

Listing disks eligible for S2D

Listing disks eligible for S2D

Finally, I ran Enable-ClusterStorageSpacesDirect, which automatically grabbed all SSDs and HDDs and added them to a single pool. The last step is creating a virtual disk on top of the pool. Here's where things are a bit different from Storage Spaces. In the old world, you had to choose your tiering and whether to use mirroring or parity manually. S2D automatically picks this for you based on number of nodes and number of disks.

Creating a virtual disk

Creating a virtual disk

Overall, setting up S2D is a lot more straightforward in the RTM release than the fiddling around required in the Technical Previews.

If you have System Center Virtual Machine Manager 2016 (VMM) you can attach an existing S2D cluster to it. On the other hand, if you have new physical servers, you can use VMM's deployment technology to install Windows Server 2016, turn the nodes into Hyper-V hosts and deploy S2D to them, all with a single checkbox in the VMM create cluster wizard.

Monitoring and managing S2D

As the name implies, System Center Operations Manager provides a management pack for S2D with a dashboard to visualize performance metrics and warn about issues. But unlike in earlier releases, the logic to gather the necessary data is not in the management pack. Instead, Microsoft has built a health service for storage (also covering Storage Replica and Storage QoS) directly into the OS. This provides information about health state as well as performance metrics. You can run Get-StorageSubSystem Name | Get-StorageHealthReport to access cluster-wide health data.

Datacom is an early adopter of S2D here in Australia. They built their own dashboard in Grafana using a PowerShell script to gather the data from the health service. DataON Storage is a popular vendor in Storage Spaces/S2D; they have built a beautiful HTML5-based dashboard for their products. You can opt in to report disk failures directly to them so they can ship a replacement drive, possibly before you're even aware of the failure.

Cluster-Aware Updating (CAU) is the inbox and cluster-aware engine for patching each cluster node in an orchestrated fashion. As of 2016, it's aware of S2D. It will only patch a node in which all virtual disks are healthy.

Conclusion

Storage Spaces Direct is a game changer. I suspect it'll be the default deployment for Hyper-V clusters going forward. This comes from the general hype about hyper-convergence in the IT industry and the benefits S2D brings. Such benefits include ease of setup, cost-effective components and fantastic performance. For the latter, Microsoft has demonstrated six million IOPS for reading in an S2D cluster. With SQL Server 2016 supported to store databases directly on S2D, that'll also be an interesting option, since many database deployments are very storage intensive. S2D is also another reminder to look seriously at RDMA networking. The enhancements in throughput and minimal CPU overhead are very attractive.

There are some words of caution, however. S2D is not a small business/branch office solution in most cases, since it requires a Windows Server Datacenter license at approximately five times the cost of the Standard license. For these smaller deployments, virtual SAN solutions such as those from StarWind still have their place.

Subscribe to 4sysops newsletter!

Since you can use S2D with almost any type of local storage, you can easily set up a lab with a few VMs and learn the technology in preparation for production deployments.

13 Comments
  1. Paul Jones 6 years ago

    Great article, do you know if Datacom shared their scripts for monitoring, or the dashboard itself?

  2. Author

    Hi Paul,

    No, I don’t think they did – they only spoke about it in a breakout session where they were brought on as a “real world company implementation”.

    /Paul

  3. Damon 6 years ago

    I know this post is old, but I do not know how else to contact you.
    Do you have the part number for the Dell switch and Chelsio adapter you used in this setup.

  4. Author

    Hi Damon,

    I don’t have them on hand but I’ll be in the lab next week, I’ll grab them then and post them here.

    /Paul

  5. Damon 6 years ago

    Thanks,
    Damon

  6. Author

    Hi again Damon,

    The switch is a Dell X4012, 12 port 10 Gb/s switch.

    Two of the servers have Chelsio T580-CR 40 Gb/s NICs. Each NIC has two 40 Gb/s ports. Only one port is used with a spider cable, taking the 40 Gb/s connection in one end and then 4×10 Gb/s connections to the switch.

    Two of the servers have Chelsio T520-LL-CR 10 Gb/s NICs. Each NIC has two 10 Gb/s ports. I have both ports used, each with a 10 Gb/s cable to the switch.

    This means two of the servers have 40 Gb/s total bandwidth, and two have 20 Gb/s (I mentioned this in the article above as well). This is a bit weird, I would recommend a production set up have the same NICs for all servers :-). (It’s because Chelsio gave me the NICs and they wanted me to test both types).

    Hope that is useful,

    Paul

  7. Damon 6 years ago

    Yes, very helpful!
    Thanks for taking time to assist with this.

    I’m looking to create S2D using the two HP ProLiant DL380 Gen9’s we have.

    After reading your article I went ahead and purchased 2) Chelsio T-520 BT nic’s and an
    HPE OfficeConnect 1950 12XGT 4SFP+ – switch

    Your most recent reply mentioned a 40gb chelsio and spider cables. Is the cable similar to a fan out cable like this:

    https://datainterfaces.com/qsfp-4sfp10-01c.aspx

    Now i’m wondering if I should have gone with 2) 40gb nic

  8. Author

    Hi again Damon,

    Yep, that’s the cables I have too.

    That would depend on the workloads. Basically if you have a LOT of churn in the disks in the VMs that are running on your cluster you could potentially saturate 20 Gb/s. But there are a lot of other factors involved such as the speed of the storage in each host, which is more likely to be the bottleneck unless you have incredibly fast storage (think multiple NVMe drives). I wouldn’t worry too much about the network bandwidth.

    So you have two hosts? What’s the storage setup?

    /Paul

  9. Damon 6 years ago

    The two ProLiant DL380 gen9’s are of the sff variety that have 24 drives in the front and two in the rear. We currently have 4) 240gb ssd and 20) 900 gb hdd  for storage using regular Storage Space in Windows Server 2016.

    I’ve been reading about how some S2D configurations forgo a switch and connect directly to each other using the 10gb or 40gb connections.

    How would that work? Would I just assign an ip address to each connecting nic port and run cables to each? Would that work on the Chelsio T-520 BT or would I have to go with one of the qsfp nic’s and make connections between them?

    Yes, this is somewhat new to me.

  10. Author

    Hi,

    With that storage the 20 Gb/s bandwidth should not be a problem.

    I haven’t actually tried a two node S2D cluster. I do have a colleague who had a two node Starwind cluster (back in the Win 2012 days) in production at a client. He’d initially connected the two nodes directly (2 x 10 Gb/s) but he found that Windows clustering would bring down the cluster when one node failed – because the NICs on the remaining node weren’t getting a signal they assumed that all networking was down. He bought a switch and put it in – all good after that.

    I do know that Chelsio are in the process of building in switch capabilities in their NICs – it other words you could build a four (or two or three) node cluster without a switch, simply daisychain each node to each other and they’ll do the switching in the NICs. Read more here https://www.chelsio.com/wp-content/uploads/resources/pr_ring_backbone.pdf.

    As for trying without a switch best advice I can give it just to test it.

    Good luck,

    Paul

     

  11. Damon 6 years ago

    Thanks for all of your input Paul.

    I have 2) Chelsio T62100-LP-CR cards on order. More than enough for my current needs, and some qsfp28 twin axial cables for interconnects. I cancelled the order on the HP switch for now. Your note about node not getting signal was noteworthy. I read about switch capability in either the T62100 or upcoming cards. As you said, I’ll run it all through it’s paces after I get everything setup.

    I’ll post back with results afterwards, for those who may be researching something similar.

    Thanks again.

  12. Damon 6 years ago

    Just read the link you provided that announces switchless ring backbone capabilities in T62100-LP-CR.

    That’s great news! Looking forward to receiving the ones I ordered.

  13. Author

    Hi again Damon,

    Looking forward to reading about your results when you’ve got it all up and running.

    Good luck!

    /Paul

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2023

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account