Your CI/CD Pipeline is a Production system with Stefan Prodan from ControlPlane (video + interview)
May 31st, 2024 | 19 min read
Table Of Contents
This interview is part of the simplyblock’s Cloud Commute Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , Pandora , Samsung Podcasts, and our show site.
In this installment of podcast, we’re joined by Stefan Prodan ( Twitter/X , Personal Blog ), a Principal Engineer at ControlPlane, who talks about the importance of recognizing that a deployment pipeline is basically a cluster admin and needs to be handled securely as a production system.
Chris Engelbert: Hello, everyone. Welcome back to the next episode of simplyblock’s Cloud Commute podcast. Today with me I have Stefan. Please pronounce the last name yourself in a second. I’m not going to try to do that myself. But Stefan joins us from ControlPlane. Before that, I guess he’s also talking a little bit about his own background. So, Stefan, welcome. And maybe say a few words about yourself. Who are you? Why are you here?
Stefan Prodan: Thanks, Chris, for inviting me. I’m Stefan Prodan. I’m a software engineer for some time now. And in the last seven years, I’ve been focusing exclusively on open-source engineering. I’ve been involved with the CNCF FluxCD project for all this time. And I’ve developed some of my own sub-projects inside FluxCD, like Flagger, for example, which is continuous delivery and progressive delivery side of that. And yeah, I help architect and shape the current version of Flux, which is version two. And yeah, my passion is around working with the cloud-native ecosystem, with Kubernetes, and building solutions on top of that.
Chris Engelbert: All right. Yeah, you said FluxCD, and I think we’re going to come back to that in a second. Right now, you’re working with a company called ControlPlane. And from what I see on the website, that is a security consultant, a cloud-native security consultancy. Let’s put it that way. So maybe say a few words about the company itself.
Stefan Prodan: Yeah, so the company is from London, but we are distributed around the globe. It’s a security company which focuses on threat modelling, pen testing for Kubernetes environments. We do architectural designs for your continuous integration and delivery pipelines with a focus on security, of course, and compliance. So yeah, our services are more around helping organizations evolve into their cloud-native journey. And while they are doing that, doing it in a safe way, you know, you should, if you are migrating to the cloud, you should gain also a better security stance out of it. And that’s one of our main focuses.
Chris Engelbert: Right, right. So that means from a customer point of view, when I joined as a customer, what would a typical customer look like? Is it like the big company that, as you said, is just moving into the cloud game? And what are the challenges they face and where you help them?
Stefan Prodan: So there are a wide range of customers. There aren’t only banks and financial institutions, but those are usually the companies that organizations that are looking for, you know, answering the questions, ‘are we really secure? Are we doing the right thing here?’ And I mean, it’s not only about most banks moved part of their infrastructure a long time ago on the cloud, right? It’s not about getting them started with, but it’s more about, you know, how the hybrid cloud looks for you, which are the challenges here. And when we go, usually we do an architectural review, we try to understand the system there. And then, you know, through pen testing, threat modelling, and other practices like that, training, we try to make first the customer’s employees more security conscious of their day-to-day operations, and then come up with a recommendation for how they can improve that. Also, with pen testing, we discover all sorts of, you know, let’s say, misconfigurations, and we also propose solutions for that, but it’s up to the customer to actually take that knowledge and make their security difference better.
So it’s a mix around, you know, consultancy going there to red team type of analysis where you poke around and see what you find, but it’s also about looking at the architecture of the whole thing and how that can be improved. Usually, improving one means also, especially in the cloud-native world, means simplification.
Like, if you see out there, like, me as a Flux maker, I’ve talked to so many users of Flux, which are, I know, thousands, tens of thousands, and when you get started with Kubernetes, it’s very easy and with the cloud-native landscape. It’s very, you know, ‘how do I solve this? Oh, I add this component, and I add this other component, and I add this other component,’ and then you have, like, ten key something controllers with hundreds of configurations and so on, right?
So if you do this in a, let’s say, rush way, or you do it as a proof of concept, and that proof of concept ends up being the thing that you are running in production, you may want to go over it and look, ‘how can I simplify this? Can I take advantage of this component and maybe eliminate other things?’ Simplifying things usually means you have a better understanding of your system, and that makes the system more secure. So, yeah, we in the cloud-native world tend to deal with massive complexity, and yeah, that’s one of the things I’m seeing, like, trying to reduce complexity and reduce the noise. It’s a good way forward.
Chris Engelbert: Right. I think one of the interesting things you mentioned is pen testing, and pen testing is always something that is dear to my heart, because I did not do it in a professional way in the past, mostly for online games and stuff. But I think it’s a really important process of actively trying to break into systems or break systems and to find those issues before, well, hopefully, the hostile actors find those. So, I think this is really interesting. That is something, I don’t know, maybe you have a different feeling about that, but I think it’s still something that is not really actively used by a lot of companies, maybe the big ones, but a lot of the smaller companies still seem to miss that, like, where they don’t really get the importance of pen testing. What do you think about that?
Stefan Prodan: Yeah, I mean, I first came to Fosdem some years ago, and Andy, who is the CEO of ControlPlane, we worked on something together, but I joined ControlPlane this year, so I’m quite new to the company. He had a talk on how to hack Kubernetes and he was on stage hacking Kubernetes from the root container on the node. ‘Okay, now I’m on the node. How can I get control of the whole control plane of Kubernetes?’ And then, ‘yay, I’m cluster admin, and from here, I can do whatever I want.’
And yeah, I think we should educate people more, Kubernetes users through things like that, you know, great talks. We, at ControlPlane, also do professional training around where we actually teach people how to hack their own Kubernetes. We have a product called Kubesim, which is a Kubernetes simulator, everybody gets a cluster, it has all sorts of, you know, you deploy in our container, now shell-exacting to it, and from there, you can go sideways and do all sorts of things. And I think that kind of mentality is important to, you know, promote it more.
Every time there is some way of getting around security constraints, that should be one of the things you have in place. So poking around it, it can be fun, and it also teaches you a lot about the system itself, you learn better Kubernetes if you try to, you know, exploit it from this perspective.
Chris Engelbert: That’s very true. It’s kind of the same thing. In the past, I advocated a lot for how to build resilient and fault-tolerant systems. It’s kind of the same thing from my perspective with security. There is no way to build a 100% secure system, except for you don’t build it at all. So embrace the idea of there are security issues, and in the worst case, pay somebody to find them for you.
It’s kind of the same thing with resiliency, right? A resilient system is nice, and you can probably build like a 100% resilient system, but nobody will pay the money for that. So it’s a trade-off. Like, how much money do I have in my bank, and how much is this problem worth solving?
Stefan Prodan: Yeah, vulnerabilities come at this point, come to you from all directions, right? It’s what we’ve seen in the last years with, you know, exploiting the continuous integration and continuous delivery pipeline. And you don’t even have to have the production system. Maybe that’s bulletproof, but you can get into some Jenkins server, which is out there on the internet with a hard coded admin password that everyone can guess very easily. And once you’re into the CI system, you can, you know, poison those binaries or deploy your own container on the production cluster, even if the production cluster is great. You’re there through the pipeline, right?
Chris Engelbert: And even worse, you’re gaining the trust of a maintainer over the years of contribution just to sneak in something into the CI/CD pipeline. Which is like, totally mind-blowing to me. Someone would invest so much time up front just to—anyway.
But you made a good bridge to FluxCD, right? You mentioned one of the important things now is that a lot of attack vectors are going towards the deployment pipeline or the CI/CD pipeline, trying to inject something at build time, and getting it signed or whatever you want to call it. It looks totally fine, but it’s still a perfect attack vector. That is where ControlPlane also comes into play with the enterprise and enterprise for FluxCD. Is that it?
Stefan Prodan: Yes. FluxCD, being a CNCF project, you as a company, even if you hire maintainers, you are not allowed to say Enterprise FluxCD because FluxCD is a brand of CNCF. And we also, you know, it’s ControlPlane Enterprise for FluxCD, some other company tomorrow can offer the same thing and there is their enterprise offering for this particular project. So that’s the meaning there.
Basically what Flux does, it’s a way for you to rip apart the CD things from your CI/CD. I truly think CI/CD shouldn’t happen in one tool or be a thing that’s like this huge monolith that builds all the code, has access to all the sources, produces artifacts, then also connects all your production systems and deploys those.
Having this kind of monolith may sound easier to get started. But if you look from a security perspective, and also from a scaling perspective, it becomes a single point of failure and a major vulnerability in the infrastructure that you have there. Also, there is this mentality where, you know, especially CI systems are not – people don’t think of them as part of your production, right? So, right, everybody has access to the Jenkins cluster or whatever. But production is secure, only SRE people have – well, if the CI system has a Kube config with cluster admin, right, because it needs to deploy all things on the cluster, then you either think of it as your production system, or you adopt a pattern like GitOps, for example, what FluxCD implements, where you move the continuous delivery side inside your production, where the thing that deploys on the cluster is running in the cluster, and it’s subject to Kubernetes RBAC, security constraints, network policies, and you apply the same, you know, security mindset to your continuous delivery tool as you apply to the whole production system itself, right.
So the shift with FluxCD and all the other GitOps tools in the ecosystem is the fact that it runs there in production, and you don’t connect from outside from Jenkins or whatever your GitHub actions, you don’t have to open your clusters on the internet, you don’t have to give some external system your cluster admin configuration and authentication. But the cluster itself goes somewhere and looks there and says, ‘oh, this is what I have to deploy, let me deploy’ and that somewhere is the Git-Repo that can be different and should be different in most ways than where you store your source code.
So you can apply constraints on who has access to the Git repository where my production system is defined. You can have a different type of, you know, groups of people and how you drive changes there, you can enforce all sorts of good practices that you can enforce on any Git-Repo, like main branches being protected, and every time you modify something on a cluster, you have to open up a request, someone from the SRE team has to approve it, ‘oh yeah, it’s okay to change this network policy,’ right, so you basically apply all the good practices that you have for your code to your production systems. You can keep these things in a separate repository or repositories, and then the production system comes to the repository, sees, ‘oh, there is a new version of this app, let me now deploy it for you.’
So you don’t go to the system, the production system comes to you and decides how the new version should be deployed. So it’s basically FluxCD if you think of it as like a proxy between, you know, the desired state which is a Git-Repo and the production system where it runs. So you no longer go to the system and control it yourself, you tell Flux, ‘hey, I would like for my cluster to look like this,’ and Flux can tell you, hey, this is not possible, I have Kyverno or OPA in here and they are blocking this change, now go and figure out the fix for it. So you can Flux integrate with admission controllers which can enforce good practices, better security constraints on top of your continuous delivery pipeline.
So there is a continuous delivery pipeline here in the cluster and a CI thing which is completely separated. So just having this separation, you know, improves your security stance and you have a more reliable way of deploying it because let’s say like you start with one production cluster, one region, then your business grows, right, maybe you move from US and you open a shop in Europe as well, you want the European customers to not have huge latency, right, not go to the cluster in US, so you’ll probably create a new cluster in the European region there. So the more your business expands, the more clusters you have and what that means if you have everything running from a single CI/CD tool, every time you add a new cluster you have to, you know, onboard it into your CI system, like setting up certificates, how you connect to it and all of that.
With something like Flux, when you add a new cluster, you bootstrap Flux which is the thing that after the cluster gets created, the first thing that gets deployed there is Flux itself and then you tell Flux, ‘hey, configure this whole cluster, this whole region according to that repository where you have defined your production system,’ it automatically does it, right, so it’s easier to, you know, expand your production system over regions and so on when you adopt something like GitOps in your pipeline.
Chris Engelbert: That was amazing. I had so many questions you had just answered all of them. You literally just went for answered all of them in one go. That was absolutely incredible.
Just one quick question because I think a lot of the audience may use something like ArgoCD, in that sense it’s kind of similar, right? It’s kind of a similar idea that you separate out like your build pipeline which would be probably like Jenkins and then you have Flux or Argo CD or something on the cluster side installing or deploying all the artifacts.
Stefan Prodan: Yeah, yeah. So there are two main projects that implement the GitOps pattern instance here that Flux and Argo CD. There is also the continuous delivery foundation and the Linux foundation, CDF, where they host the Jenkins X which is, it’s a rewrite of Jenkins that has GitOps features. There is also Tekton in CDF as a project which does continuous, can do continuous integration but can also configure Tekton to do continuous delivery. It also runs in your cluster and there are other projects out there which have or which have begun implementing GitOps features into it. So GitOps is quite mature as a way of doing continuous delivery right now.
It’s far, far away from when I started with it seven years ago which felt like ‘well what is this GitOps thing?’ Right now people like actually get it and GitOps says the idea is not new, is not something that we invented in the cloud-native space. It’s an idea, it’s an old idea that Puppet did the same exact thing way before Kubernetes with the agents and everything. So it’s the idea that you have some kind of agent in your production system that pulls the desired state from outside and tries to change the system and make it fit into what you have described is over 12 years old. Puppet did really good back then.
Chris Engelbert: Yeah, I agree. The whole GitOps thing, it’s one of those things which is around for a while but never had a real name but people have done it for quite a while. So yeah, I agree.
In the sake of time because we’re already behind the 20 minutes but I really want to ask you like what is your personal view on the future? What do you think is like the next big thing? Is there something you see coming as like the next innovation of GitOps or CD pipeline security, whatever you think?
Stefan Prodan: So for me, what I am trying to promote inside the FluxCD organization and through the Flux project and all the Flux maintainers are – we try to drag Flux in a direction where we offer a different way of doing GitOps without Git in production, but with Git still as the tool that you use for collaboration. So what we are shifting into Flux, and it’s already in there, we have production users using it, is where we use the container registry as the thing that holds your whole desired state and we rely upon the open container initiative specification, which I know since two years ago, three years ago, it has this concept of an OCI artifact.
So an OCI artifact is what you are already using it. If you use a container image, that’s an OCI artifact and it’s a tarball which has some metadata and it’s stored in the container registry. Those are your app images and with Flux, what we’ve done is we are offering tools also in the CLI and also the controllers where you can say I can do a Flux push which is the same as a Docker push, but instead of pushing your binaries with Flux push, you push the configuration of your cluster which can be all the Kubernetes YAMLs, custom resources, Helm charts, all the definition of your production cluster, it’s stored in the container registry which by design is more closer to the cluster. It’s HA and can live inside your private VPC next to the cluster where Git did something that it’s usually outside of that trust zone because developers have to have access to it and so on.
So you will basically push the configuration to your Git-Repo but instead of Flux coming from the cluster to Git and basically getting over the security trust zone in this area, you will push the configuration along with your when you do the Docker build and Docker push right after that will do a Flux push, the configuration of that application to the same container registry. You sign it in the same way with coastline or notation and when Flux deploys the new version of the app instead of going to Git, goes to the container registry, pulls the definition there, verifies that the definition is correct, and only then deploys it.
So it fits in the security model and we are also promoting this through our ControlPlane offering for Flux, the Enterprise Edition where we want to ensure that ControlPlane customers which are relying on Flux, they can adopt a more secure way and a better way of doing continuous delivery not only from a security perspective but also from a reliability perspective right because you no longer have to get Git in there in your production system and you can rely on the container registry which you should already have it in there, you should have figured out how to do a change for it and if you are using a cloud vendor you already have all these things but no cloud vendor out there will give you the same SLAs and the same insurances for a Git offering in the way they do it for container registry right so that’s where we are moving with Flux.
Chris Engelbert: That is a really interesting approach and I never thought of it like the container registry to me always was like an image registry basically that’s an interesting approach to just reuse the same system and say okay now you basically push your consider or your suggested state and on the other side you just pull it and rebuild it. That’s interesting. Unfortunately we’re out of time, I have a few more questions about, 20 minutes are over so thank you very much it was a pleasure I hope people learned something. I certainly learned something new, I only used ArgoCD in the past so Flux is new to me and thank you for being here. Thank you Chris.
Stefan Prodan: Thank you for inviting me, yeah please try Flux, you’ll love it.
Chris Engelbert: Yes please try Flux and until next week when you come back and listen into the next episode you have one week to try Flux now. Thank you very much for being here and hear you guys next week or you hear me whatever. See ya!
Topics
Share blog post
Tags
CI/CD, CI/CD pipelines, Cloud Commute, Cluster Admin, Continuous Deployment, continuous integration, Deployment Pipeline, FluxCD, KubernetesYou may also like:
How to Build Scalable and Reliable PostgreSQL Systems on Kubernetes
Why would you run PostgreSQL in Kubernetes, and how?
Amazon EKS vs. ECS: Understanding the Differences and Choosing the Right Service