Zscaler Blog

Erhalten Sie die neuesten Zscaler Blog-Updates in Ihrem Posteingang

Abonnieren
Zscaler Cloud Connector Monitoring Dashboard
Products & Solutions

My Cloud Connector Monitoring Stack - AWS (Part 1)

image

Sometimes the best ideas start with a simple “what if”. Over the years, I’ve had countless conversations with customers around how to monitor and alert on Zscaler Cloud Connectors – especially in cloud-native environments. So one night, I challenged myself: could I build a fully cloud-native monitoring stack using just the native tools from AWS, Azure, and GCP?

Spoiler: I could

This blog is Part 1 of that journey, focused entirely on what the outcome looks like – a CloudWatch dashboard with key metrics, visualizations, and alerts I’d want to see in production. Think of it as a preview of what’s possible if you’re running Zscaler Cloud Connectors in AWS with Auto Scaling Groups and Gateway Load Balancer

While it’s not an official Zscaler solution (or supported by our team), I hope this sparks ideas for your own deployments. And if you’re wondering how to build it – don’t worry, Part 2 will dive into the full configuration, metrics, thresholds, and a CloudFormation template so you can try it out yourself

Of course, Zscaler already offers native integrations with Nanolog Streaming Service (NSS) to stream Cloud Connector Events and Metrics logs to your on-premise or cloud SIEM – I’ll include some links to the official help portal for that below, but this project is all about cloud-native monitoring!


 

Nanolog Streaming Service (NSS)

Before we get into this project, please know that Zscaler officially has the ability to stream Cloud Connector Events and Metrics using NSS to your existing SIEM. The key metrics are included and are cloud-agnostic, so you can actually build a beautiful dashboard and alerts in your own SIEM using this method. Here are a few links in case you weren’t aware of this integration:

Now when it comes to Cloud Connectors specifically, there are two feeds:

  • Event Logs. These logs are focused on sending status changes for health checks, Zscaler service edge reachability, etc
  • Metrics Logs. These logs are focused on sending Overall CPU, Data Plane CPU, Control Plane CPU, memory, throughput utilization, etc

Now back to the fun stuff...


 

What To Monitor (and Why)?

Let’s focus on what matters most in Cloud Connector architecture: the Data Plane

Cloud Connectors have both a Control Plane and a Data Plane. While the Control Plane handles configuration sync, log forwarding, and updates, traffic keeps flowing even if it’s down

The real question we want to answer:

Can Cloud Connectors keep processing workload traffic without impacting performance?

That’s the job of the Data Plane. It securely tunnels traffic to the Zscaler Zero Trust Exchange (ZTE)—either ZIA, ZPA, or directly to the Internet—depending on your policy.

If the Data Plane is healthy, your traffic is safe. That’s why this blog centers around monitoring only the metrics tied to traffic flow and tunnel health


Key Metrics I Focused On

Every deployment is different, but here’s what I chose to monitor—and why. These are grouped by Zscaler Location, meaning any number of Connectors behind the same GWLB:

MetricExplanation
EC2 Health and AvailabilityTrack total Cloud Connectors and their status to quickly identify failures
Auto Scaling Group UtilizationGauge how close you are to max capacity and when scaling might be needed
Data Plane CPU UtilizationCritical for understanding throughput. High CPU = nearing traffic limits
GWLB Target Group Health ChecksDirectly tells you if tunnels are up and traffic is flowing
Data Plane Bytes (In/Out)Visualize how much data is processed through Connectors—optional, but insightful
GWLB Bytes ProcessedUseful for spotting patterns or changes in workload traffic over time
NAT Gateway Bytes (In vs Out)Helpful for understanding data transfer patterns and potential cost impacts
Network Latency (RTT in ms)Adds a layer of observability for network impact going out to Internet


 

CloudWatch Dashboard

This is what I built in the lab. Everything is visualized in a single CloudWatch dashboard, broken into logical sections:

  • Header Text: Highlights region, VPC ID, ASG usage, and environment label
  • Text Dividers: Split the dashboard by function (Health, CPU, Scaling, etc.)
  • Metric Widgets: Visualize each of the key metrics above using gauges, graphs, and single values

Tip: I used Average over 5-minute periods for most non-network metrics, and Sum for all throughput-related data

As this is my lab environment I need to be transparent and mention this single dashboard is for a single location (aka a set of Cloud Connectors) behind the same GWLB. We don't want metrics aggregated across different sets of Connectors in different Regions as that wouldn't be useful from an operational perspective. However, that can be a useful Executive view if you think that's important! For large deployments would it make sense to create a different dashboard for each set of Cloud Connectors behind the same GWLB? Maybe. This is where tinkering comes in handy!

Image

 

Dashboard Sections (What & Why)

First thing’s first, create a new CloudWatch Dashboard with the text dividers and then add the widgets

Image

 

Data Plane Health Checks

Shows the percent of healthy Cloud Connectors behind the GWLB. If this drops too low, traffic may not be processed reliably

Image

 

Data Plane CPU Utilization

Custom metric from Zscaler showing CPU load for data processing. Higher values mean you’re nearing traffic limits. Only available with ASG deployments

Image

 

Auto Scaling Utilization

Displays current vs. max instances in the ASG. Useful to see if scaling events are occurring or if you’re hitting configured limits

Image

 

GWLB Service Traffic (Processed Bytes)

Shows how much data the GWLB has processed in GB. Gives insight into workload throughput patterns over time. You can easily convert this to Gbps if needed

Image

Cloud Connector Data Plane Traffic

Visualizes bytes in/out on the data plane interface of the Cloud Connectors. Good for spotting large uploads/downloads, or verifying traffic is flowing

Image

NAT Gateway Utilization

If your Connectors use a NAT Gateway for tunnel egress, this shows bytes in and out. Helpful for understanding data transfer costs (DTO applies only to bytes out)

Image

Bonus: Latency Metrics

CloudWatch Network Monitor probes simulate traffic and measure round-trip latency.

  • Run one probe through Cloud Connectors (via GWLB) going to the IP of a known public site. I chose to use TCP Port 80 for ipinfo.io
  • Run one probe through Cloud Connectors (via GWLB) going to the IP of a Zscaler cloud component. I chose to use TCP Port 80 for login.zscalerthree.net 
  • Run one probe directly to the Internet (via NAT Gateway) going to the same IP of a known public site to use as a baseline

Get a baseline of the three to detect anomalies or performance issues through the Zscaler or entire AWS path

Image

 

Alerting Strategy

To keep things manageable, I built four key CloudWatch alarms that notify via SNS (email):

AlarmWhat it does
Healthy Targets < 25%Indicates tunnel or instance issues affecting the Data Plane
ASG Utilization > 90%Warns when scaling limits are near and capacity might be maxed
Data Plane CPU > 80%Suggests traffic load may be too high—triggers before packet loss
Network Latency (Anomaly)Detects spikes in RTT beyond a defined baseline—great for spotting Zscaler path issues

You can customize these thresholds for your environment, but these gave me solid coverage of health, scale, and latency

Image

 

What’s Next

Everything you’ve seen here – from visualizing data plane health to tracking processed traffic and scaling activity – was built entirely from native AWS (and one Zscaler custom) metrics, and a pinch of CloudWatch magic.

In Part 2, I’ll break down exactly how I built this: the metrics I used, the math behind the expressions, the alarm logic, and of course, a reusable CloudFormation template to get you started

And for those of you running Azure and/or GCP, Parts 3 and 4 are coming soon. Let’s nerd out together!

form submtited
Danke fürs Lesen

War dieser Beitrag nützlich?

Erhalten Sie die neuesten Zscaler Blog-Updates in Ihrem Posteingang

Mit dem Absenden des Formulars stimmen Sie unserer Datenschutzrichtlinie zu.