Hyper-v networking best practices for clustering

1. Core Design Principles

Redundancy everywhere
- At least 2 physical NICs per traffic type (teamed or separate) where possible.
- Use dual ToR switches (stack/VPC/MLAG) so every host has uplinks to both.
- Avoid single points of failure: no single NIC, single switch, or single VLAN for critical paths.
Traffic separation
At minimum, logically separate:
- Management
- VM/tenant traffic
- Cluster/heartbeat + CSV
- Live Migration
- Storage (iSCSI or SMB / S2D east-west)
Use dedicated VLANs and where possible dedicated NICs or vNICs (with QoS).
Consistent, deterministic configuration
- Same number of NICs, same names, same vSwitch name, same VLANs, same QoS on all nodes.
- Standardize IP schemas by function (e.g., 10.10.1.x mgmt, 10.10.2.x LM, 10.10.3.x cluster, etc.).
Throughput > latency for VM & LM, latency > throughput for heartbeat
- Heartbeat/cluster doesn’t need huge bandwidth, but must be stable and low-latency.
- Live Migration and storage need high bandwidth, low packet loss.

2. Recommended NIC Layout (Typical 2-Socket Host)

Minimum practical pattern (per host):

Mgmt / Host OS
- 2 x 1/10/25 GbE (teamed) – VLAN for management + maybe BMC on separate OOB network.
VM / Tenant Traffic
- 2 x 10/25 GbE → vSwitch (SET or LBFO if older) for VMs.
Cluster / CSV / S2D / SMB
- 2 x 10/25 GbE (RDMA capable strongly recommended: RoCEv2 or iWARP).
Live Migration
- Either:
  - Shared with CSV/S2D (high-bandwidth RDMA), but separate VLAN & QoS or
  - Own pair of NICs or vNICs on the main vSwitch.

On smaller hosts, you might combine some roles but never put everything on one adapter team without QoS and VLAN separation.

3. Virtual Switch & Teaming Best Practices

Use SET (Switch Embedded Teaming) on 2016+
- For Hyper-V clusters on 2016/2019/2022+, prefer SET over classic LBFO teaming for VM traffic.
- Create one SET vSwitch per host using 2+ physical NICs:
```
New-VMSwitch -Name "vSwitch-Prod" -NetAdapterName "NIC1","NIC2" -EnableEmbeddedTeaming $true -AllowManagementOS $false
```
- Then add vNICs for Mgmt, LM, CSV, etc. on top of SET if using converged networking.
Converged networking vNICs
- On hosts, create vNICs attached to the same vSwitch:
  - vNIC-Mgmt
  - vNIC-LiveMigration
  - vNIC-Cluster/CSV
  - vNIC-Backup (if needed)
- Bind each vNIC to its own VLAN and QoS weight (see QoS section).
Avoid mixing “AllowManagementOS = $true” with converged design
- Prefer no management OS sharing on the main vSwitch:
  - Use host vNICs (created via Add-VMNetworkAdapter -ManagementOS) instead of binding the OS directly to the pNIC.
For older OS (2012 R2)
- Use LBFO NIC Teaming with Dynamic load distribution and Switch Independent mode.
- Still keep the same concept: single team feeding a vSwitch, then vNICs on top.

4. VLAN & IP Addressing Guidelines

Create dedicated VLANs per traffic type
Suggested layout:
- VLAN 10 – Management
- VLAN 20 – VM/Production
- VLAN 30 – Cluster/Heartbeat
- VLAN 40 – CSV/S2D/Storage
- VLAN 50 – Live Migration
- VLAN 60+ – Backup / Replication / DMZ segments as needed
IP schema example
- Mgmt: 10.10.10.0/24
- Cluster/Heartbeat: 10.10.20.0/24
- CSV/S2D/SMB: 10.10.30.0/24
- Live Migration: 10.10.40.0/24
- iSCSI: 10.10.50.0/24 (non-routed)
Routing rules
- Cluster/CSV/S2D and iSCSI subnets are usually non-routed (east-west only).
- Enable routing only where necessary and secure with ACLs/firewall to reduce blast radius.

5. QoS and Bandwidth Management

Use Hyper-V / SMB QoS with converged networking

Assign minimum bandwidth weights to each vNIC.
Example weight scheme (total = 100):
- VM traffic: 50
- CSV/S2D/SMB: 25
- Live Migration: 15
- Management: 10

PowerShell example:

Set-VMNetworkAdapter -ManagementOS -Name "vNIC-VM"           -MinimumBandwidthWeight 50
Set-VMNetworkAdapter -ManagementOS -Name "vNIC-S2D"          -MinimumBandwidthWeight 25
Set-VMNetworkAdapter -ManagementOS -Name "vNIC-LiveMigration" -MinimumBandwidthWeight 15
Set-VMNetworkAdapter -ManagementOS -Name "vNIC-Mgmt"         -MinimumBandwidthWeight 10

SMB Multichannel + SMB Direct (RDMA) for S2D / CSV / Live Migration
- Ensure multiple NICs/RDMA adapters can be used concurrently.
- Keep SMB traffic on dedicated subnets/VLANs.
Don’t throttle cluster heartbeat too much
- Heartbeat is low-bandwidth but time-sensitive.
- Ensure it has enough bandwidth and low latency; never run it only over saturated pathways.

6. Storage Network (iSCSI / SMB / S2D)

iSCSI SAN
- Use dedicated NICs for iSCSI only.
- No default gateway on iSCSI NICs; static routes if needed.
- Enable Jumbo Frames (MTU 9000) end-to-end if supported (hosts, switches, storage).
- Use MPIO with at least two paths per host.
SMB 3.x / Storage Spaces Direct
- Use RDMA-capable NICs (RoCEv2 or iWARP) with:
  - PFC/ETS properly configured if RoCE.
- At least 2 x 10/25 GbE RDMA NICs per host dedicated to S2D/CSV and LM.
- Separate Cluster/S2D VLANs from production.
Avoid mixing storage + noisy VM traffic on the same physical NICs without strong QoS and capacity.

7. Live Migration Network

Dedicated or converged vNIC
- Place Live Migration on its own subnet & VLAN.
- Configure LM settings to use that network only (Failover Cluster Manager or PowerShell).
Compression vs. SMB vs. RDMA
- For RDMA NICs: use SMB (RDMA).
- If no RDMA: Compression is often faster than TCP in most environments.
Throttle LM concurrency
- Tune number of simultaneous migrations + bandwidth limit so LM doesn’t starve CSV or VM traffic.

8. Cluster / Heartbeat / CSV Network

At least two distinct networks
- Cluster will use any enabled network, but:
  - Mark at least one network as “Cluster use only.”
  - Avoid relying solely on the management network for heartbeats.
Cluster network order
- Set network metric / “Role” so:
  - Storage/CSV network is preferred for CSV traffic.
  - Management network is used for client access but not primary for heartbeat.
Name each network clearly
- E.g., “ClusterNet-Heartbeat,” “ClusterNet-CSV,” “MgmtNet-Prod” so troubleshooting is easier.

9. Security Best Practices

Isolate management and storage from user/VM networks
- Use separate VLANs and secure routing.
- Only admin jump hosts and monitoring tools should reach management IPs.
Use firewalls
- Harden Windows Firewall with cluster + Hyper-V rules.
- Close all non-required ports; restrict RDP, WinRM, SMB to admin/workload subnets.
Use secure management protocols
- WinRM over HTTPS, SSH (if needed), RDP gateways.
- Disable legacy/weak protocols (SMB1, old cipher suites).
Protect virtual switches
- Enable DHCP guard, Router guard, Port ACLs as needed on VMs.
- Consider MAC spoofing only where required (e.g., NLB, some appliances).

10. Switch Configuration Best Practices

LACP / Static teaming
- If using SET with Switch-Dependent mode, configure LACP on the switches.
- If Switch Independent: ensure both switches are in same L2 for VLANs, but no LAG required.
Spanning Tree / PortFast
- Enable equivalent of PortFast / Edge on server ports to avoid STP delays.
- Don’t oversubscribe too heavily; plan uplink capacity vs host aggregate bandwidth.
Consistent switch templates
- Same VLANs, trunks, QoS, MTU, ACLs across every ToR switch so any host can plug into any port.

11. Testing & Validation

Before putting into production:
- Test failover of each NIC (pull cables).
- Test switch failure (shut down one ToR).
- Validate Live Migration under load.
- Validate CSV failover and storage performance.
- Run Test-Cluster and fix all networking-related warnings/errors.
Monitoring
- Enable monitoring for:
  - NIC errors/drops
  - RDMA counters
  - Live Migration failures
  - Cluster heartbeat loss events
- Use perfmon / SCOM / Azure Monitor / other tools as appropriate.

12. Example “Gold” Pattern (Summary)

For each Hyper-V cluster node:

4 x 25GbE NICs:
- NIC1+NIC2 → SET vSwitch-Prod
  - vNIC-VM (VLAN 20)
  - vNIC-Mgmt (VLAN 10)
  - vNIC-LM (VLAN 50, QoS weight 15)
  - vNIC-Cluster (VLAN 30, QoS weight 10)
- NIC3+NIC4 → RDMA (no vSwitch)
  - S2D/CSV/SMB (VLAN 40, dedicated subnets, SMB Multichannel/Direct, QoS weight 25+)
Optional extra 1GbE:
- Out-of-band management / iLO / iDRAC on separate management network.