From UX to API: Mastering Platform Validations with Kubernetes Validating Admission Policies

Validations are extremely important when building Internal Developer Platforms, or any platforms for that matter. They serve two primary purposes. They help us ensure that user’s requests are valid before they are processed by the platform and they make user experience better. The problem, however, is that people mix those two all the time. One is ensuring that something is valid and the other is improving user experience without guaranteering validity of the requests. The first group are “real” policies while the other is just UX.

Let me give you a few examples, and you tell me which is which.

This is a Web UI which has some fields set as required forcing users to fill in those that must have some values. It also has other type of validations like those on Scaling Max and Scaling Min fields that force users to put a value greater than or equal to 2. If any of the values in that form is not correct or is missing, users cannot proceed. Does that qualify as policies that ensure that only valid requests are sent to the platform?

Here’s a different example, this time with a custom-made CLI-like script.

Do NOT try to execute the commands from this section. This is only a preview. We’ll set up everything later.

If I try to build an image without specifying the tag,…

platform build image

The output is as follows.

Error: nu::parser::missing_positional

  × Missing required positional argument.
   ╭─[<commandline>:1:17]
 1 │ main build image
   ╰────
  help: Usage: platform build image {flags} <tag> . Use `--help` for
        more information.

…I get the message that it is a required argument. Arguably, I could have made that a bit better output message. Still, the point is that images, in this scenario, cannot be built without tags, so the CLI prevented users from doing that. Does that qualify as policies that ensure that only valid operations are performed?

How about kubectl? If we, for example, try to apply a manifest to the Namespace (b-team) that does not exist,…

kubectl --namespace b-team apply --filename crossplane/repo.yaml

…we get the message Error from server (NotFound)... namespaces "b-team" not found. Assuming that, in this scenario, Kubernetes is the platform, does that qualify as a policy that ensures that only valid requests are passed through?

There are cases where we cannot validate input like, for example, in the case of Git.

Let’s say that I would like to push a manifest to Git so that Argo CD or Flux picks it up and synchronizes modified resource manifests into the cluster. So, if we add,…

git add .

…commit,…

git commit -m "This time with Argo CD"

…and push changes,…

git push

…we get no message except that the push was successful. We could, at least in theory, validate that what was pushed is correct, but I doubt that we could do that in practice given that anything could be inside that commit.

Here’s the last example I prepared.

This is Argo CD UI that is trying to synchronize the previous commit to the cluster. If we take a look at the silly-demo appclaim we can see the message that it is forbidden because some ValidatingAdmissionPolicy says that resource is not allowed to be synchronized into the cluster. Is that a policy that ensures that only valid requests are synchronized into the cluster that acts as a platform?

Among all those examples, only two are actual policies that prevent incorrect inputs into the platform, and neither of them are enforced by the tool we used. When we executed kubectl apply, the message that the Namespace does not exist did not come from kubectl but from the cluster itself. kubectl has no idea which Namespaces exist and which don’t. It sent a request to Kubernetes API which responded with “though shall not pass”.

Similarly, the message we saw in Argo CD does not come from Argo CD. It has no idea what are valid and what are invalid resources. Instead, just as kubectl, it sent a request to Kubernetes API which, again, responded with “You cannot do that. Go away.” The only thing Argo CD did was to show us the message it received as the response from Kubernetes API.

The examples from the Port Web UI and the custom CLI are client-side validations. They improve user experience since they help them see the issues sooner rather than latter, but they do not prevent users from doing what they should not do. Now, that sentence might sound confusing. After all, if we did do not fill in the required fields or we do not put the correct values, the Web UI will not let us submit data. While that is technically correct, that sentence is based on a wrong assumption.

It assumes that there is only one interface that can be used with the platform. More often than not, that is not true. The fact that we allow users to use that Web UI to interact with the platform API does not mean that there are no other paths available. Having a Web UI does not mean that people cannot synchronize the desired state into the cluster, the platform, using GitOps. That Web UI could be sending requests directly to Kubernetes API or it might be pushing manifests to Git so that Argo CD or Flux synchronize them in the true GitOps fashion. If the Web UI can push changes to Git, something else or someone else could do that as well, just as I did earlier. Someone else might be interacting with the cluster using a CLI like kubectl. Heck, someone could be using curl to talk to it.

The point I’m trying to make is that there can be an infinite number of ways one can interact with an API so we cannot rely on a single one of those ensuring that requests are valid.

We could, at least in theory, put validation in all the possible tools that might interact with an API, but that would be silly. That would never end since we would need to implement the same set of validations over and over again only to discover that it is impossible to fight against infinity.

All in all, the only way to run reliable and safe validations is to have processes that do that on the right side of the API. Instead of trying to add them to every single tool that interacts with an API, we can instruct the API itself to validate incomming requests, no matter where they are coming from, and, depending on those validations, let them pass or reject them.

Does that mean that we should not validate anything inside Web UIs, CLIs, or whichever other mechanism we might have to interact with an API? The answer to that question depends on the user experience we want to create.

Here’s an example. Inside my cluster I might have a policy that validates that the specific field of a specific kind of a resource is a number and that it is greater to or equal to 2. That would be functionally the same validation as the one we saw in the Web UI earlier. If the data from that form was submitted to the API, API would reject it if the value of that field does not match the rules. Still, from the UX perspective, it is nicer to get that feedback while we’re typing the value than to wait to click the button and get the response from the server. That’s user experience, not reliable validation of the input.

So, there is user experience and there are policies or, to be more precise, ultimate validations that inputs are correct. Implementatioon of UX validations depend on the tool we’re working with and might need to be implemented in multiple places. The “real” validations are those performed by APIs when they are deciding whether to prosecess or reject requests. If you can do only one of those, do the latter, the validations by the API. That does not mean that UX is not important, it is, and you should certainly make user experience good. Still, if we cannot do both, API is where we should focus.

Luckily, in case of Kubernetes, there is a mechanism to do just that baked into it. It’s called Admission Controller Webhooks which can validate any individual request sent to the API. In the past, we had to use third-party solutions like Kyverno and OPA to implement those Webhooks. Now, however, we don’t necessarily need those any more since Kubernetes now comes with its own implementation of baked-in policies based on Admission Controllers.

That’s what we’ll explore today. We’ll see how we can use Validating Admission Policy to validate requests coming into the API with the goal to enable users of our Kubernetes-based platform to do the right thing.

Setup

git clone https://github.com/vfarcic/idp-full-demo

cd idp-full-demo

git fetch

git checkout policies

Make sure that Docker is up-and-running. We’ll use it to create a Kubernetes KinD cluster.

Watch Nix for Everyone: Unleash Devbox for Simplified Development if you are not familiar with Devbox. Alternatively, you can skip Devbox and install all the tools listed in devbox.json yourself.

devbox shell

Watch The Future of Shells with Nushell! Shell + Data + Programming Language if you are not familiar with Nushell. Alternatively, you can inspect the setup/kubernetes.nu script and transform the instructions in it to Bash or ZShell if you prefer not to use that Nushell script.

chmod +x platform

platform setup policies

source .env

The Problem with Admission Controllers

Here’s an example of a typical set of resources we might have to run an application in Kubernetes.

cat kubernetes/app.yaml

The output is as follows.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: silly-demo
  name: silly-demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: silly-demo
  template:
    metadata:
      labels:
        app.kubernetes.io/name: silly-demo
    spec:
      shareProcessNamespace: true
      containers:
      - image: ghcr.io/vfarcic/silly-demo:1.4.327
        livenessProbe:
          httpGet:
            path: /
            port: 8080
        name: silly-demo
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /
            port: 8080
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 256Mi
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app.kubernetes.io/name: silly-demo
  name: silly-demo
spec:
  ingressClassName: nginx
  rules:
  - host: silly-demo.127.0.0.1.nip.io
    http:
      paths:
      - backend:
          service:
            name: silly-demo
            port:
              number: 8080
        path: /
        pathType: ImplementationSpecific
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: silly-demo
  name: silly-demo
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app.kubernetes.io/name: silly-demo
  type: ClusterIP

That’s a Deployment, an Ingress, and a Service. Normally, our applications would be more complicated than that but, for the sake of the demo, that should be enough.

Now, if we execute kubect ... apply,…

kubectl --namespace a-team apply --filename kubernetes/app.yaml

…we can see the the deployment, the ingress, and the service was created, even though we might not want to allow them to be created.

Let’s say that we would like to have a rule that says that each application needs to have at least two replicas for availability and performance reasons. How could we ensure such a rule is enforced? Should we create a rule that allows only Deployments that have the spec.replicas value set and that the value must be greater than 1? Well… We cannot do that since there are many ways how we could get multiple replicas of an application in Kubernetes. Aside from defining the number of replicas in the Deployment resource, we could use HorizontalPodAutoscaler (HPA) instead. If we do, it would do the scaling depending on metrics. However, we might choose KEDA instead which also does automated scaling, but provides many more options to define the conditions that result in scaling.

So, we do not necessarily know how one might fullfil the requirement that an application must have multiple replicas and, therefore, cannot enforce a policy.

Now, we could say that everyone MUST use HorizontalPodAutoscaler. If we do that, we still could not create a policy based on admission controller webhooks since they are fired for each resource individually. When a request to create or update a Deployment is sent to Kubernetes API, there is no guarantee that HPA was not already created or will be created afterwards. Kubernetes is all about eventual consistency so we cannot be sure that HPA is created before the Deployment so we cannot define a policy that prevents creation of Deployments that do not have matching HPA.

The solution to that problem, and quite a few others, is to create our own abstractions. We can, for example, create our own Application CRD that will expand into Deployment, Service, Ingress, HPA, or anything else we might need to run an application. If we do that, we will accomplish at least two objectives. First, we’ll make a user-friendly interface developers can use. That, however, is not the subject of this post. The second objective is that we can define policies that allow or dissalow certain capabilities without worrying whether those capabilities are implemented by one, two, or any other number of resources. After all, a big part of the work on developer platforms is creating the abstractions that enable users to accomplish certain goals without going crazy. It’s about creating the right level of abstractions and in Kubernetes we do that through CRDs and controllers.

Fortunately, there are many tools that help us to that with relative ease. There is KubeVela, Crossplane, Kro, and many others. I will use Crossplane today, mostly because that’s the project I’m working on.

The important note is that today’s post is not about Crossplane and that you should be able to create CRDs and controllers with many other tools. We just need it to demonstrate the point I’m trying to make.

Let’s remove the resources we created earlier and start over.

kubectl --namespace a-team delete --filename kubernetes/app.yaml

Here’s an example of an application definition we’ll use.

cat tmp/appclaim.yaml

The output is as follows.

apiVersion: devopstoolkit.live/v1alpha1
kind: AppClaim
metadata:
  name: silly-demo
  labels:
    app-owner: vfarcic
spec:
  id: silly-demo
  compositionSelector:
    matchLabels:
      type: backend
      location: local
  parameters:
    namespace: a-team
    image: ghcr.io/vfarcic/idp-full-demo
    tag: "0.0.5"
    port: 8080
    host: silly-demo.127.0.0.1.nip.io
    ingressClassName: nginx

Over there we’re defining only the things that matter like the id and the type of the application (backend) and a few parameters like the namespace, the image, the tag, and a few others.

That is functionally the same as what we saw before. If we apply it, it will create a Deployment, an Ingress, and a Service. The major difference is that creating a policy that, for example, prevents deployment of an application that does not have multiple replicas will be much easier. As I already mentioned, the fact that single resource is much more user-friendly than having to define multiple low-level Kubernetes resources is not important since that’s outside today’s scope.

Let’s confirm that we can indeed apply that resource,…

kubectl --namespace a-team apply --filename tmp/appclaim.yaml

…and that it composed the same resources as before.

kubectl --namespace a-team get all,ingresses

The output is as follows.

NAME                              READY   STATUS    RESTARTS   AGE
pod/silly-demo-67556fd8fc-9hff6   1/1     Running   0          6s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/silly-demo   ClusterIP   10.96.168.188   <none>        8080/TCP   6s

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/silly-demo   1/1     1            1           6s

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/silly-demo-67556fd8fc   1         1         1       6s

NAME                                   CLASS   HOSTS                         ADDRESS   PORTS   AGE
ingress.networking.k8s.io/silly-demo   nginx   silly-demo.127.0.0.1.nip.io             80      6s

That’s it. Now that we have the abstraction we can work with, we can, finally, create the policy we talked about. But, before we do, let’s remove that Claim first.

kubectl --namespace a-team delete --filename tmp/appclaim.yaml

Kubernetes Validating Admission Policy

Let’s take a look at a policy I prepared.

cat kubernetes/policies.yaml

The output is as follows.

---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: dot-app
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups:   ["devopstoolkit.live"]
      apiVersions: ["*"]
      operations:  ["CREATE", "UPDATE"]
      resources:   ["appclaims"]
  validations:
    - expression: |
        has(object.spec.parameters.scaling) &&
        has(object.spec.parameters.scaling.enabled) &&
        object.spec.parameters.scaling.enabled        
      message: "`spec.parameters.scaling.enabled` must be set to `true`."
    - expression: |
        has(object.spec.parameters.scaling) &&
        object.spec.parameters.scaling.min > 1        
      message: "`spec.parameters.scaling.min` must be greater than `1`."
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: dot-app
spec:
  policyName: dot-app
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: a-team

The first resource is the ValidatingAdmissionPolicy with two validations.

The first one checks whether scaling and scaling.enabled fields have values and, if they do, whether scaling.enabled is set to true. That’s the policy, as the name of the field suggests, that ensures whether scaling is enabled.

The second validation checks whether scaling.min field has the value that is greater than 1, meaning that the minimum number of replicas is 2.

Both of those validations are applied only to CREATE and UPDATE operations of appclaims resource.

The second resource in that manifest is ValidatingAdmissionPolicyBinding that will Deny creation of appclaims if they do not match the policy and if they are applied to the a-team Namespace.

I won’t go deeper into policies since I already did that in the Kubernetes Validating Admission Policy Changes The Game video. What I will say is that now it is GA meaning that it is available out-of-the-box in all Kubernetes clusters starting with version v1.30. So, there is no need to install any third-party applications like Kyverno, OPA Gatekeeper, or others. That does not mean that I think that Validating Admission Policy baked into Kubernetes is better than, let’s say, Kyverno, but only that it is already in your cluster and that you might want to check whether it meets your needs before reaching for other solutions.

Another note is that since we’re using an abstraction AppClaim instead of working with all the individual resources, we don’t need to worry how will scaling be done but only that the values in instances of that abstraction are correct.

Okay… Let’s apply that policy,…

kubectl apply --filename kubernetes/policies.yaml

…and see whether we can still apply the Claim we used before.

kubectl --namespace a-team apply --filename tmp/appclaim.yaml

The output is as follows.

The appclaims "silly-demo" is invalid: : ValidatingAdmissionPolicy 'dot-app' with binding 'dot-app' denied request: `spec.parameters.scaling.enabled` must be set to `true`.

This time we can see that the API did not let us pass. When we sent the request to apply that resource, it came back to us saying that spec.parameters.scaling.enabled must be set to true.

That’s awesome since that means that we don’t have to care any more whether someone applies resources through kubectl, or Argo CD, or Port, or Backstage, or any other tool. That’s not our concern any more since that validation happens independenty of what or who sends requests to the API. The only thing missing is to create RBAC that dissalows creation of anything but AppClaim resource, otherwise people would still be able to circumvent that policy by creating something else. I’ll leave RBAC for some other time and, for now, assume that you know how to set it up.

Let’s fix the problem by adding scaling.enabled set to true to the manifest,…

yq --inplace ".spec.parameters.scaling.enabled = true" \
    tmp/appclaim.yaml

…and try to apply it again.

kubectl --namespace a-team apply --filename tmp/appclaim.yaml

The output is as follows.

The appclaims "silly-demo" is invalid: : ValidatingAdmissionPolicy 'dot-app' with binding 'dot-app' denied request: `spec.parameters.scaling.min` must be greater than `1`.

It is still failing. Kubernetes API still does not allow us to pass through but, this time, for a different reason. We need to set scaling.min to a value greater than 1, so let’s do just that.

We’ll change the scaling.min to 2,…

yq --inplace ".spec.parameters.scaling.min = 2" \
    tmp/appclaim.yaml

…and apply the manifest again.

kubectl --namespace a-team apply --filename tmp/appclaim.yaml

This time it worked. It passed all Admission Controller validations and is now applied to the cluster.

Behind the scenes, Crossplane expanded that claim into a Deployment, an Ingress, a Service, and, since we enabled scaling, a HorizontalPodAutoscaler.

We can confirm that by listing all resources and ingresses in the a-team Namespace.

kubectl --namespace a-team get all,ingresses

The output is as follows.

NAME                              READY   STATUS    RESTARTS   AGE
pod/silly-demo-67556fd8fc-jp5hn   1/1     Running   0          5s

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/silly-demo   ClusterIP   10.96.24.68   <none>        8080/TCP   5s

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/silly-demo   1/1     1            1           5s

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/silly-demo-67556fd8fc   1         1         1       5s

NAME                                             REFERENCE               TARGETS                                     MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/silly-demo   Deployment/silly-demo   cpu: <unknown>/80%, memory: <unknown>/80%   2         10        0          5s

NAME                                   CLASS   HOSTS                         ADDRESS   PORTS   AGE
ingress.networking.k8s.io/silly-demo   nginx   silly-demo.127.0.0.1.nip.io             80      5s

Now you know what to do with validations in your Developer Platform. Move them into the API. That way validations will be enforced no matter who or what sends requests to the API. Once you’re done with policies, feel free to work on UX of the Web UI, CLI, or whatever else you’re building on top of that API. Just remember that validations over there are not reliable but only ways to improve user-experience.

Destroy

platform destroy policies

git checkout main

exit