Home Insights The Pincer Approach to Cloud Governance

The Pincer Approach to Cloud Governance

Insights

Why cloud controls fail, and how to make them stick.

Cloud governance is rarely a greenfield exercise. More often, it begins when something feels off – costs have drifted, accountability has blurred, and teams are working around the platform rather than with it. In those moments, the challenge isn’t a lack of tools or frameworks, but how to reintroduce control in application-driven environments without breaking trust or slowing delivery.

Most of the time, cloud governance fails not because the controls are wrong, but because they arrive before people are ready for them.

I’ve learned the hard way that you don’t fix cloud governance by tightening screws – you fix it by helping cloud and engineering teams see why change is needed, and applying controls only when the organization is ready to absorb them.

When Cloud Journeys Drift Off Course

Judging by our dazzling displays of technical expertise (see LinkedIn), one would assume that consultants are always building cloud-native, AI-empowered cutting-edge solutions for the new age.

Yet time and again some of our most productive engagements start on simpler terms – the proverbial cloud journey gone slightly awry. Was it an audit? An unexpected bill? A nagging sense that things are getting harder to manage, rather than easier? A customer reaches out and says ‘We might need some help with that stuff’.

The latest State of FinOps surveys visualize this well. 

Clients are looking for that one thing above all else: order. A sense of control, with visibility, and a feeling that cloud usage is intentional rather than accidental. 

So, they reach out and we set out to help – onwards to great success, right?

In reality the landing is rarely smooth and often for good reasons.

More often than not we are dropped into environments where technical teams are already doing their own thing in their own ways, for reasons that once made sense. Coincidentally, those same ways are likely what triggered the need for external help in the first place – not exactly fertile ground for constructive collaboration.

We see the same blockers surface again and again in successive State of FinOps reports: friction between platform and product teams, and governance frameworks that exist on paper but struggle to gain real traction. All of it boils down to the same challenge:

How do you introduce control without alienating the people doing the work? 

Industry standards seem to favor a direct approach. Demonstrate expertise. Establish authority. Lean on the stakeholder-issued mandate when necessary. Disrupt existing dysfunctional workflows and deliver competent solutions while keeping our eye on the end goal.

There’s plenty to be said about this approach and it sure does work sometimes – I’ve seen some impressively forceful consulting deliver some solid results over the years.

In this article I want to explore how that model often limits long-term impact and feeds into some persistent negative stereotypes of consulting. I’ll propose a more balanced approach: one that builds momentum through collaboration first, and introduces governance controls only once the ground is ready for them.

Technology and tooling will feature, but only as supporting characters. My real focus is on how governance takes root when you’re working with real teams, real incentives, and real constraints. 

The Engagement: A Familiar Pattern

To keep things succinct and in the interest of privacy I will fold some of the experiences we’ve had with several companies into a gestalt premise – a consulting engagement with California’s Local Investment Enterprise Trust (CLIENT).

Being the financial institution that it is, CLIENT maintains multiple internal and client-facing applications, many of which can be safely classified as classic. Over the last few years, they’ve been gradually migrating some of that estate onto AWS.

Recently, however, cracks have started to show. Cloud costs were rising with little explanation, environments were becoming harder to reason about, and familiar symptoms were surfacing: underutilized resources, ownership and accountability issues, and a growing lack of oversight.

CLIENT reached out with a straightforward mandate – keep costs in check, clean things up and set up some basic governance.

Landing in the Organisation

We start, as usual with introductions to the key stakeholders and an overview of the organizational flow. It’s a pretty standard setup – the larger apps have dedicated dev teams, there are also service teams for some of the common tools (JIRA, Jenkins, etc.) and most teams use a similar set of tools.

Well-established processes exist for release and change management, but cloud usage has been somewhat shoehorned into a framework originally designed for owning and maintaining on-prem environments.

We are formally assigned to report to Barbara – head of the Important department and CLIENT’s engagement lead. She is also the one who negotiated our contract, so for all intents and purposes, we now work for Barbara.

Next, we’re introduced to Archie – lead architect for the Important applications and our main point of contact for solution design and high-level planning in general. 

Finally, there’s Dave, who helps us with onboarding. Dave grants us access to all relevant systems, walks us through the tools in use, and provides valuable on-the-ground perspective into how the various tech teams actually work.

Step 1 – Visibility before Authority

Our first order of business is to roll out a scalable reporting framework – something to serve as a foundation for all future improvements. We use scheduled AWS Lambda functions for collecting resource data on a daily basis, which keeps things simple, modular and scalable.

For visualization, we make a point of using the existing Grafana service, which is CLIENT’s de-facto standard for dashboarding and reporting. This saves us some time and effort, makes the output familiar to stakeholders and easy to digest and the whole thing feels like ‘business as usual’, which is key at this early stage.

We set two distinct goals for the pilot outputs: cover the basics and avoid controversy.

In practice, this means simple high-level inventories – EC2 instances, EBS volumes, ELBs and a few others. These include tagging information and properties related to pricing and utilization but carefully avoid direct finger-pointing such as flagging resources as compliant/non-compliant.

After the initial deployment we continue to expand the reports and build new ones. Archie and Barbara are the primary audience at first, but before long others start to get involved, asking questions and suggesting additional views and metrics.

Step 2 – Meeting Teams Where Change Happens

Once the reporting is in place and growing it’s only a matter of time before the questions start popping up. Why is this resource here? Why aren’t we using a cheaper instance type? What’s with the crazy tagging?

This doesn’t happen overnight – even Archie needs a moment to review and consider, and Barbara has even less time to take in the detail. The gradual rollout of the reports helps here – it gives people space to engage without overwhelming them, and keeps us from drowning in findings ourselves. More involved analysis, such as detailed Cost and Usage Reports can come later. 

At this stage Dave and the rest of developers are still in ‘doing their own thing’ mode. The downstream effects of our involvement are several cycles away from causing them any headache – the ideal moment to get involved and build some rapport.

While developing the reporting solution, we also built up some understanding of how work actually gets done. In this case, most teams use Jenkins for orchestration, Terraform for IaC and Ansible for server config.

Jenkins quickly stands out as the most effective place to contribute something of value for wider use.

We approach Dave with a proposal: a shared Jenkins library that wraps common use cases into reusable Groovy steps, with improved logging and parametrization. He’s immediately on board – not least because it reduces pipeline complexity for his team by a third – and we start building it on behalf of his team.

This work needs to be approached differently from reporting. While reports have clear requirements and a defined audience, this shared library will be more like an open-source project. It needs to be clean and simple with a wider scope and supported with straightforward documentation.

As a concrete example, we add a Terraform step that collapses all of the pre-apply hassle – fetching binaries, credentials, initialisation, planning – into a single well-defined action with cleaner and more predictable output.

Step 3 – Letting Standards Emerge

While working on both solutions, we slowly start to gain traction with Archie and Barbara on the actual improvement opportunities. The reporting now provides enough of a clear picture to talk meaningfully about savings – which resource types make sense, what can be cleaned up, and how tagging should work. 

There’s quite a lot of literal consulting going on at this point. It could easily be the subject of another article or two, but for the sake of focus we’ll fold this under a simple summary: some good decisions get made.

Taken together, those decisions amount to a rudimentary internal compliance framework – what’s acceptable to run, and the acceptable ways of running it. Nothing is formally enforced yet, but the direction of travel is becoming clear. 

Naturally, once those criteria are established, the stakeholders start expressing interest in enforcement. It’s crucial to keep this in check:

Unless we have an extremely accurate assessment of the environments in real time, it’s unwise to start shutting things down or deleting resources.

Instead, we rely on existing internal communication channels for the message to spread. Management signals that some new standards for cloud usage are coming, and teams are expected to comply. This is also the first moment where our involvement begins to interfere, even slightly, with people’s workflows – which makes careful handling of the communications all the more important. We deliberately keep a low profile and let Archie take centre stage. He is happy to lead the improvements and we can focus on helping Dave and his colleagues meet those new goals.

Step 4 – Making Non-Compliance Visible (and Safe to Fix)

With things now in motion, we start move in step with the rest of the organization, dancing to the tune of ‘more order, less spending’.

By this point, we’ve built solid working relationships and have a solution base that allows us to support those improvements.

Most importantly, while those are in fact our overarching goals, we’ve made sure the effort feels internally driven, not like a third-party edict.

From a solution perspective, the first change is to add compliance indicators to our inventory reports, making outstanding resources visible to everyone. Alongside this, we add notifications – sending mails to creators or owners when something needs correcting, such as an overprovisioned instance.

On the Jenkins side, we offer Dave something extra: OPA-based checks to prevent (re)provisioning of non-compliant components. In the current context, he’s happy to have that.

The notifications prompt conversations among the teams. What’s wrong? Why does this matter? How do we fix it? Those settle into regular weekly discussions, usually led by Archie, where the team leads report on progress and savings and we report on the state of our tools and discuss ways to improve them.

Over time this results in a steady – if a bit slow – flow of feature requests to expand both the reporting and the Jenkins library. It takes a moment for the process to settle in, as the different teams approach things at their own pace, but we eventually get everyone on board.

Step 5 – Introducing Guardrails, not Roadblocks

After making some initial gains we inevitably hit a familiar hurdle. Certain non-compliant resources keep appearing in new deployments, often driven by someone not fully in-line with the new standards who is building something new on the fringes of their team. We all know the type. 

This starts to create friction. A handful of bad deployments now make entire dashboards look bad, which can quickly turn into a source of irritation between teams. Fortunately, we’re ready with a mitigation plan.

The first move is to tighten the OPA configuration. Most ‘warn’ rules become ‘deny’, straight up preventing non-compliant deployment for anyone using our shared Jenkins library – which at this point has become standard.

In parallel we introduce some light restrictions in AWS. IAM roles for deployment agents are revised and a few Service control policies are put in place to prevent unsanctioned use.

At this stage some pushback is unavoidable. Even when restrictions don’t prevent anyone from doing their job – and they shouldn’t – certain people simply don’t react well to having constraints imposed. In practice, those exact people happen to be the dashboard-breakers, so resistance is limited. The wider engineering team is broadly supportive, particularly when cost becomes part of the conversation.

Step 6 – Automating the Last Mile

Several cycles in, we have the whole process running smoothly and delivering the results we were aiming for.

There are very few non-compliant resources remaining in any of the AWS accounts and we’ve largely met our initial savings goals.

However, the improvement metrics (yes, we have a report for that now) are slowly starting to plateau. This is expected – cost optimizations usually have steep diminishing returns as the most obvious opportunities have already been addressed.

At this point there’s one last move to make – and given the way things have gone so far, we’re well positioned to make it. 

We set up a daily run of corrective Lambdas to shut down or delete non-compliant resources, starting with EC2 instances.

The rollout is deliberately gradual. We begin with less critical environments, with changes announced internally, as usual, internally by Archie or Barbara. We gradually expand the corrective cycle, covering more resource types and use cases, adding exceptions for critical systems, and introducing notifications where needed.

Crucially, the risk of accidentally breaking a live system is now very low. 

Teams are aware of the internal compliance drive, know what automation will act on, and can take pre-emptive action. We can rely on app owners to ask for exception, or fix issues before enforcement kicks in.

Wait, is This Governance?

Technically, yes – and it happened organically. All carrot, no stick.

We started with reports: detective controls that gave everyone visibility of what was actually running and provided the foundation for the whole thing.

From there came the questions – what are we seeing, what do we like (or not), what should we do about it? We could have dropped some dictates here, but we opted to stay in the background and help the developers prepare for the changes that were coming.

As standards began to shape up, we added guardrails through a combination of AWS restrictions and OPA-based checks – preventive controls that nudged behaviour without breaking delivery.

Only once we had gained enough traction and the bulk of the tech teams were involved in the process, did we roll out the final piece: auto-remediation in the form of corrective controls.

Looked at in isolation, this might not feel like much. A handful of reports. A shared Jenkins library. Some Lambda functions running on a schedule. Hardly the sort of thing that usually gets labelled cloud governance.

And yet, stepping back from the tooling and the labels, a clearer pattern emerges. 

One prong of the approach (the reports) brings visibility – insights into cost, usage and behaviour. The other (for us, the Jenkins library) gives you leverage – influence at the point where the resources are actually provisioned and changed. 

You can invest more or less into either, but without both, governance struggles to gain traction.

What matters most is not the controls themselves, but how they’re introduced. By building collaboration first, letting standards emerge, and enforcing only once teams are ready, governance stops feeling like an external imposition and starts behaving like a shared operating model.

One of the most critical pain points highlighted across FinOps reports – empowering engineers to take action – is effectively tackled. 

And that, in practice, is what makes it stick.

Peter Boev, February 2026


Phi works with organisations to establish practical cloud governance that holds up over time – balancing visibility, guardrails, and automation without disrupting delivery teams. If you’re trying to bring order to an evolving cloud estate without slowing people down, we’d be happy to help.

Contact us for further information on relevant services: sales@phipartners.com