top of page
  • Facebook
  • Twitter
  • Linkedin

HCX Data‑Plane Diagnostics — The Practical Guide to Faster Troubleshooting

When HCX migrations or network extensions misbehave, most engineers instinctively start checking firewall rules, routing, or DNS. That’s necessary—but it’s not always the fastest approach. HCX includes a powerful built-in feature called Data‑Plane Diagnostics that helps you validate site‑to‑site data‑plane connectivity and quickly pinpoint why a Service Mesh tunnel is down, degraded, or behaving inconsistently.


This blog explains what Data‑Plane Diagnostics is, how to run it, and how to interpret the report.


What is “Data‑Plane Diagnostics” in HCX?

HCX Data‑Plane Diagnostics is a set of diagnostic tools designed to troubleshoot data‑plane connectivity issues between paired sites in a Service Mesh. Data‑Plane Diagnostics validates the connections between those appliances across the networks defined in the Service Mesh (such as Management and External/Uplink).


How to run Data‑Plane Diagnostics

Running diagnostics is straightforward:

  1. Open HCX ConsoleInterconnectService Mesh

  2. Go to Data‑Plane Diagnostics tab

  3. Click Run Diagnostics

  4. Wait for the report generation (can take few minutes)


Note: While diagnostics are running, Service Mesh operations are blocked, so you should avoid running it during critical migration windows if you’re actively moving workloads.



Now, understanding the Diagnostics Report

After the test completes, HCX generates a report that summarizes probe results from each site in the Service Mesh. The report groups results by:

  • Appliance groups at each site (for example, Interconnect appliances and Network Extension appliances)

  • Networks associated with the services (for example, management network vs uplink network)


The report shows structured probe details such as:

  • Status of probe

  • Source and Destination appliances

  • Protocol (including port/protocol)


How to interpret Data-Plan Diagnostic results

  1. Is the tunnel up and stable?

  2. Is it “up” but degraded (underlay quality issues—latency, loss, or throughput constraints)?

  3. Is it failing due to MTU / fragmentation (due to HCX IPsec tunneling overhead and potentially encryption overhead)?


Common patterns and troubleshooting help:

Pattern 1 — Tunnel/IPsec path/UDP 4500 failures

Validate:

  • Firewall/NAT rules for tunnel traffic

  • Reachability between appliance uplink interfaces

  • Packet captures if needed (especially when traffic leaves ESXi but never returns)


Pattern 2 — PMTU/MTU failures

If PMTU/fragmentation:

  • verify underlay MTU end-to-end (switches, routers, WAN, cloud)

  • account for HCX encapsulation overhead

  • re-run diagnostics after correcting MTU


Pattern 3 — Degraded performance (WARN)

If connectivity is up but performance is inconsistent:

  • evaluate underlay quality (loss/latency/bandwidth)

  • use additional performance tests to build a baseline and compare


Note & Production-friendly best practices

✅ Run it before important migrations: Run diagnostics before large migration waves to validate the mesh and uncover hidden issues (MTU mismatches, degraded uplinks, etc.).

Don’t run Transport Analytics simultaneously.

Be aware of the blocking behavior: Since Service Mesh operations are blocked while the diagnostics test runs, schedule it outside peak migration windows


References (Official)

Comments


Contact Us

Thanks for submitting!

 Address. 500 Terry Francine Street, San Francine, CA 94158

Tel. 123-456-7890

© 2035 by ITG. Powered and secured by Wix

bottom of page