Zscaler Blog

Get the latest Zscaler blog updates in your inbox

Subscribe
Products & Solutions

Unleashing the Power of Zscaler Cloud Operations for Unprecedented Cloud Resilience

DIANA VIKUTAN, ROHIT GOYAL
August 04, 2023 - 4 min read

Rapid adoption of cloud services has helped organizations innovate, save on costs, and more. However, as more critical services rely on cloud resources, organizations need to plan for unexpected interruptions and outages. That's why we created the Zscaler Zero Trust Exchange: to offer one of the world’s most powerful and resilient security clouds.

Our platform’s cloud native infrastructure and operational excellence ensure high availability and serviceability at all times, giving organizations and their customers peace of mind.

 

A security cloud you can trust

Zscaler operates the largest security cloud in the world, serving 7,000+ customers and 50+ million users, processing over 300 billion transactions a day, and receiving 500 trillion health performance and security metrics. Building a cloud of this size and scale takes millions of hours and deep experience across four key areas: capacity, availability, performance, and security.

 

Image

 

  1. Enterprises need enough capacity to handle large-scale events, from a company meeting to a holiday rush.
  2. It's also important to ensure availability, even if cables are cut or ISPs are down, to minimize downtime and avoid support desk calls.
  3. When it comes to security, a zero trust strategy enables strict user authentication and least-privileged access controls to establish context and apply policies.
  4. Finally, a good user experience—the ultimate measure of performance in a security cloud—is achieved through a zero trust architecture that seamlessly routes traffic without compromising on security.

 

Ensure performance at every path

To ensure a seamless user experience, it's crucial to prioritize performance at every stage of the data path. At Zscaler, this is exactly why we developed our own security cloud. While most issues reported by users occur between the user and the Zero Trust Exchange, the majority of issues actually happen along the path to the application from the Zero Trust Exchange.

 

Image

 

Unless we’re dealing with a complete outage, detecting degradation of an individual user experience across 300 billion transactions a day is next to impossible. Our goal is to find a solution across the entire data path to help our customers automatically route around the problem, restoring user performance regardless of the underlying root cause.

 

Zscaler Digital Experience for CloudOps

At Zenith Live ’23, we introduced our latest innovation: Zscaler Digital Experience (ZDX) for CloudOps, an AI- and ML-driven user performance platform, which the Zscaler CloudOps team uses internally to detect, analyze, and remediate degradations. Here’s a screenshot of the CloudOps dashboard, indicating that operations are healthy.

 

Image 

 

Issues are easy to locate on the global map. Let’s look at an incident from March 21, 2023, when Zscaler observed decreased performance for users in and around Singapore, and how we resolved the situation with ZDX for CloudOps.

 

Image

 

Below is a baseline view of performance in Singapore before the incident. You can see hundreds of last mile ISPs, ISPs Zscaler uses, and the Zscaler cloud. Here, there are no issues indicated and performance is green.

 

Image

 

As the issue developed, our CloudOps team noticed an issue between two major Asian ISPs: Singtel and Testra.

 

Image

 

Issues like these often resolve on their own, but in this case, it got worse, and the Zscaler Network & Infrastructure Team had to intervene.

 

Image

 

Drilling into the connection between the ISPs, we can see it’s not a simple connection. There are multiple connection points in Singapore alone, and at least three unique routes between Telstra and Singtel. The connectivity graph shows that one of the three paths is impacted.  

 

Image

 

With the available information and our advanced technology, we can quickly resolve issues through automated remediation or targeted actions. This is crucial in maintaining dependable and efficient performance. To address the performance issue, our Networking team optimized the path by redirecting some traffic via NTT while still allowing unaffected traffic to flow along its original path, minimizing the impact on users.
 

Image

 

In this incident, although all paths were outside the user’s ability to change, Zscaler was able to detect and remediate the incident to ensure a seamless user experience.

 

Join the digital transformation journey and learn more about Zscaler cloud operations

ZDX for CloudOps enables Zscaler Operations teams to easily visualize comprehensive issues in real time and take priority-based action using data-driven recommendations to prevent user experience issues.

If you’re interested in discovering more about how Zscaler manages operations on a global scale, please request a demo.

We hope you were able to join us at Zenith Live ’23 in Las Vegas or Berlin to celebrate innovation, collaboration, and succeeding together. If you missed it, you can still catch up: watch select innovation and insight sessions on demand.

form submtited
Thank you for reading

Was this post useful?

Get the latest Zscaler blog updates in your inbox

By submitting the form, you are agreeing to our privacy policy.