DC 5 levels below ground

Five Floors Underground: The Wake-Up Call for Cloud Resilience

 

A few years ago, I was auditing one of the largest financial institutions in the Middle East for their cyber resiliency. During the walkthrough, I discovered their primary data center was five floors underground. Five!

Naturally curious, I asked their CISO if there was any specific reason for going this deep? Although this was a legacy he had inherited, his answer was simple:

“Regional constraints.”

I didn’t fully get it then. I do now.

Last week, Iranian drone strikes took out AWS data centers in UAE and Bahrain. Banking apps went dark. Payment systems froze. Enterprise software across the region just… stopped.

The cloud, it turns out, has a very physical address. And that address can be hit.

That CISO and his predecessors weren’t being paranoid. They were being realistic. They understood something that a lot of us in infosec and cloud governance are only now waking up to, that in certain parts of the world, your DR strategy isn’t just about ransomware and config drift. It’s about missiles and drones also.

This changes the conversation around cloud concentration risk entirely. When we evaluate third-party cloud providers, how many of us are factoring in geopolitical threat vectors against the physical infrastructure? How many risk registers account for kinetic attacks on a hyperscaler’s availability zone? Also, the lack of transparency by the cloud computing companies on their Multi-Availability zones distance isn’t helping the cause.

The Gulf has cheap energy, massive funding, and ambitious AI plans. But the same geography that makes it attractive also makes it a target. The $2 trillion in tech investment commitments from last year look a lot different today.

A while ago this news about Iran targeting the financial institutions in Israel and surrounding regions, the DC being in a secure physical location makes even more sense:  https://www.reuters.com/world/middle-east/iran-will-target-us-israeli-economic-banking-interests-region-state-media-2026-03-11/

For those of us in GRC and infosec this is a wake-up call. Cyber Resilience isn’t just a checkbox on a compliance framework. Sometimes it means putting your data center five floors underground and not explaining why to auditors who ask too many questions.

That CISO’s predecessors knew the assignment.

#cyberresilience #DC #DR #BCP #Geopolitics #physicalsecurity

https://maps.app.goo.gl/URnYm2iRYVgVEVsb6?g_st=ic

Audit Survival Guide for Startups and Engineers

A few years ago, while I was leading security initiatives at a fast-growing fintech, the regulator decided to conduct an inspection, what most people would formally call an audit. All hell broke loose, most of the team came from an e-commerce background, where regulatory audits were rare, if not completely unheard of.

Questions started flying almost immediately. What exactly is an audit? What will they ask? Do we need to generate evidence? How strict are auditors? What’s the flow of the inspection?

During one of our internal audit discussions, someone from the engineering team casually referred to the auditor as “the dude.” The auditor, quite rightly, responded with: “Please use parliamentary language” 😀

That moment stuck with me and became the trigger for documenting a set of very practical do’s and don’ts. We eventually navigated the inspection successfully. I spent a good amount of time guiding engineers, articulating the process, and helping everyone stay calm, prepared, and professional.

I was reminded of that experience recently when a client found themselves in a very similar situation: first audit, regulated environment, and a lot of nervous energy. That’s what led me to put together this survival guide.

Any Senior engineer working in a fintech company would face an Audit at some point in their career. Audits don’t just involve leadership; engineers, admins, and team leads are often pulled in to explain systems, controls, and evidence. If you’re facing your first audit, I hope this helps you approach it with a bit more confidence (and fewer surprises).


1. Professional Conduct (Non-Negotiable)

  • Be formal and respectful. Address auditors as Sir/Madam.
  • Avoid casual language like bro, dude, or inside jokes—even with colleagues.
  • Remember: everything said in an audit meeting is effectively “on record”.

2. How to Answer Questions

  • Answer only what is asked, no extra context unless requested.
  • Keep responses clear, factual, and concise.
  • If unsure, say: “I’ll check and get back with a confirmed answer.”
  • Never guess, assume, or improvise.

3. Use the Audit Anchor

  • There is usually a designated audit point-of-contact from the company side.
  • Route clarifications, follow-ups, or concerns through this person via a separate channel.
  • Do not attempt to “negotiate” or over-explain directly with auditors.

4. Stay Calm (Even When It’s Painful)

  • Some auditors deeply understand systems; some don’t.
  • Repetitive or irrelevant questions are part of the job and don’t take them personally.
  • Frustration or irritation only creates more scrutiny.

5. Evidence: Quality > Quantity

  • Share only the evidence requested and nothing more.
  • Verify accuracy before submitting (wrong evidence causes more questions).
  • Prefer PDF format for all evidence.
  • Name files clearly and consistently.

6. Documentation Is Your Shield

  • Document your work as you build—not just for audits.
  • Good documentation reduces explanations, meetings, and follow-ups.
  • If it’s not documented, assume you’ll have to explain it later.

7. Screen Sharing: Handle with Care

  • Never share your entire screen on Zoom, Google Meet, etc.
  • Always share only the specific application window needed.
  • Double-check for sensitive tabs, terminals, or notifications before sharing.

8. Sensitive Data: Stop and Escalate

For any of the following, do not share directly:

  • Database screenshots
  • PII or customer data
  • Passwords, secrets, tokens

Always get approval from: Your Manager, InfoSec and/or Legal teams.


Final Rule of Thumb

Be precise. Be calm. Be professional.

Audits are about evidence and clarity and not speed or bravado.

You might also be interested in reading about Audit Fatigue

TPRM Audit Fatigue: When Trust, Time, and Teams Collide

 

audit fatigue

Lately, I’ve been observing a growing trend where Financial Institutions (Banks and NBFCs) are increasingly mandating 3-day onsite audits as part of their Third-Party Risk Management (TPRM) programs. It often feels like an implicit signal that they don’t fully trust the fintechs or startups they work with, even those that proudly hold ISO 27001 or SOC-2 certifications. These certifications were meant to demonstrate a baseline of security maturity and due diligence, yet they’re being treated more like a footnote than a foundation.

Now, if you’re a fintech or startup working with even a moderate number of financial partners, say 8 to 10, your security and GRC teams could be spending upwards of 30 working days a year managing these TPRM audits. That’s nearly a month of valuable bandwidth lost to redundant assessments and fragmented processes.

To make matters even more tangled, there’s no standard playbook across auditors. Some send sprawling spreadsheets. Others insist on a live walkthrough with no prep and rapid-fire questions. Yet another expects you to navigate a third-party portal with its own quirks and terminology. Every new audit feels like starting from scratch.

Walkthrough-style audits, in particular, tend to be the most disruptive. They often require specific team members to join calls, explain configurations, demo access flows, or justify implementation choices. And since the questions tend to repeat across audits, these sessions end up being déjà vu for many teams, especially engineering. Their time is typically reserved for product building and problem-solving. Getting them to repeatedly field audit questions naturally creates friction between Engineering and Security, and sometimes even with the partner institutions.

On the flip side, the pressure within startups isn’t helping either. Many founders are pushing their teams: security, engineering, DevOps, legal, consultants—to rush through ISO or SOC-2 readiness on extremely tight deadlines. I’ve heard the same frustration echoed by several folks: long nights, tight audits, no breathing room. It’s become a checkbox race, not a maturity journey.

There’s also a growing school of thought within the industry that ISO/SOC-2 reports, especially the ones churned out by compliance automation platforms are becoming more of a sales enablement tool than a reliable indicator of security posture. That perception is driving financial institutions to dig even deeper during TPRM audits, essentially second-guessing the very frameworks designed to reduce the need for redundant assessments.

It’s tempting to wish for regulatory clarity here—perhaps a unified guidance from the regulator on how TPRM audits should be approached across the ecosystem. But that might be asking too much, given the operational nature of these audits and the regulator’s usual hands-off stance on implementation details.

To me, this is a multi-layered challenge. ISO and SOC-2 were designed to communicate security assurance to stakeholders for both internal teams and external partners. But if the output is no longer trusted, the entire premise starts to wobble.

As a small experiment, I once created a detailed Security Handbook for a client I consult for as a vCISO. It outlined their security practices end-to-end and drastically improved our turnaround time for security questionnaire responses. But unfortunately, the auditors weren’t too pleased—they preferred their own templates, their own questions, their own format. It didn’t matter that the answers were clear and well-structured. Standardization was nowhere in sight.

And let’s not ignore the irony where auditors are still asking for screenshots in an age where APIs could provide real-time evidence. It just feels out of sync with the pace and capabilities of modern tech. Honestly, this entire space is long overdue for disruption.

So the question is “how do we solve this?” What’s a practical, scalable way to balance assurance demands with the productivity of already stretched teams? How do we rebuild trust in certifications without burning out people in the process?

Would genuinely love to hear your thoughts.

Will Regulators Mandate Multi-Region over Multi-AZ in Cloud ?

Multi AZ datacenters

There was a time when one could simply point to the a,b,c of the three data centres in a Multi-AZ (Availability Zone) configuration as a satisfactory replacement of traditional Data Center–Disaster Recovery (DC-DR) requirements even for the regulated sectors. However, as cloud providers face more frequent outages, it’s becoming harder to consider this setup a reliable solution.

Just recently, Google Cloud (GCP) experienced a significant outage lasting about 12 hours, which took down the entire europe-west3 region in Frankfurt, Germany. The root cause was traced to a power and cooling failure, which forced a portion of a zone offline, affecting a wide range of services from Compute to Storage.

Traditionally, disaster recovery setups have involved multiple data centers designated as Primary and Secondary (or DR centers) with data replication. Often, these DC and DR are in separate cities. In the companies that I worked, these were atleast 300 kms apart. While effective, this approach is costly and challenging to maintain, with lot of administrative overhead.

This raises questions about how fintechs and other regulated sectors meet Business Continuity and Disaster Recovery (BCPDR) requirements in the cloud.

In cloud environments, redundancy can be implemented in several ways, with Multi-AZ setups being one of the simplest. Multi-AZ configurations typically involve three data centers within a city, usually spread 30-100 kilometers apart and identified as zones “a,” “b,” and “c.” Data is replicated in real-time across these zones, allowing this setup to be marketed as an alternative to traditional DC/DR arrangements. However, Multi-AZ is not a default feature; it requires opting in and incurs extra costs.

Another approach is Multi-Region, where data is replicated across geographically distinct zones, often in different seismic areas. This setup helps mitigate the risk of a single region being impacted by events such as severe flooding, prolonged power outages, political unrest, and similar disruptions.

Interestingly, some financial institutions, especially Banks are not entirely sold on Multi-AZ setups, pushing their technology partners to adopt multi-region architectures across separate seismic zones. While no regulatory body in India has mandated multi-region setups yet, it’s worth considering the distance between zones within each cloud provider:

  • Amazon AWS states that its Availability Zones are up to 60 miles (~100 kilometers) apart. Interestingly, AWS keeps the exact locations confidential, even from most employees. Source: https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/availability-zones.html#:~:text=Availability%20Zones%20in%20a%20Region,with%20single%2Ddigit%20millisecond%20latency
  • Microsoft Azure mentions a minimum of ~400 kilometers between its AZs, which is the greatest distance among major cloud providers. Source:https://learn.microsoft.com/en-us/azure/reliability/cross-region-replication-azure#what-are-paired-regions)).
  • Google Cloud (GCP) does not publicly disclose the distances between its AZs, keeping this detail confidential. surprising!

Given these developments, regulators may eventually require multi-region setups for regulated entities and fintechs using cloud services instead of relying solely on Multi-AZ. So far, Multi-AZ has offered a relatively straightforward solution to meet compliance and audit requirements for BCPDR. However, it might be time to reconsider RTO (Recovery Time Objective) and RPO (Recovery Point Objective) expectations in cloud environments.

Any thoughts on these developments?

  • Link to the RCA: GCP Outage Incident Report https://status.cloud.google.com/incidents/e3yQSE1ysCGjCVEn2q1h)
  • Article on 12 hour outage: https://www.theregister.com/2024/10/25/google_cloud_frankfurt_outage/