Javier Ramírez

Understanding Kubernetes Security on Docker Enterprise 3.0

Javier Ramírez

This is a guest post by Javier Ramírez, Docker Captain and IT Architect at Hopla Software. You can follow him on Twitter @frjaraur or on Github.

Docker began including Kubernetes with Docker Enterprise 2.0 last year. The recent 3.0 release includes CNCF Certified Kubernetes 1.14, which has many additional security features. In this blog post, I will review Pod Security Policies and Admission Controllers.

What are Kubernetes Pod Security Policies?

Pod Security Policies are rules created in Kubernetes to control security in pods. A pod will only be scheduled on a Kubernetes cluster if it passes these rules. These rules are defined in the  “PodSecurityPolicy” resource and allow us to manage host namespace and filesystem usage, as well as privileged pod features. We can use the PodSecurityPolicy resource to make fine-grained security configurations, including:

  • Privileged containers.
  • Host namespaces (IPC, PID, Network and Ports).
  • Host paths and their permissions and volume types.
  • User and group for containers process execution and setuid capabilities inside container.
  • Change default containers capabilities.
  • Behaviour of Linux security modules.
  • Allow host kernel configurations using sysctl.

The Docker Universal Control Plane (UCP) 3.2 provides two Pod Security Policies by default – which is helpful if you’re just getting started with Kubernetes.These default policies will allow or prevent execution of privileged containers inside pods. To manage Pod Security Policies, you need to have administrative privileges on the cluster.

Reviewing and Configuring Pod Security Policies

To review defined Pod Security Policies in a Docker Enterprise Kubernetes cluster, we connect using an administrator’s UCP Bundle:

$ kubectl get PodSecurityPolicies
NAME           PRIV    CAPS   SELINUX    RUNASUSER   FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES                                                
privileged     true    *      RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *
unprivileged   false          RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *

These default policies control the execution of privileged containers inside pods.

Let’s create a policy to disallow execution of containers using root for main process. If you are not familiar with Kubernetes, we can reuse the “unprivileged” Pod Security Policy content as a template:

$ kubectl get psp  privileged -o yaml --export > /tmp/mustrunasnonroot.yaml

We removed non-required values and will have the following Pod Security Policy file: /tmp/mustrunasnonroot.yaml 

Change the runAsUser rule with “MustRunAsNonRoot” value:

apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:  
  name: psp-mustrunasnonroot
spec:
  allowPrivilegeEscalation: false
  allowedHostPaths:
  - pathPrefix: /dev/null
    readOnly: true
  fsGroup:
    rule: RunAsAny
  hostPorts:
  - max: 65535
    min: 0
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'

We create this new policy as an administrator user in the current namespace (if none was selected, the policy will be applied to the “default” namespace):

$ kubectl create -f mustrunasnonroot.yaml                      
podsecuritypolicy.extensions/psp-mustrunasnonroot created

Now we can review Pod Security Policies:

$ kubectl get PodSecurityPolicies --all-namespaces
NAME               PRIV    CAPS   SELINUX    RUNASUSER          FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES
psp-mustrunasnonroot   true    *      RunAsAny   MustRunAsNonRoot   RunAsAny   RunAsAny   false            *
privileged         true    *      RunAsAny   RunAsAny           RunAsAny   RunAsAny   false            *
unprivileged       false          RunAsAny   RunAsAny           RunAsAny   RunAsAny   false            *

Next, we create a Cluster Role that will allow our test user to use the Pod Security Policy we just created, using role-mustrunasnonroot.yaml.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: role-mustrunasnonroot
rules:
- apiGroups:
  - policy
  resourceNames:
  - psp-mustrunasnonroot
  resources:
  - podsecuritypolicies
  verbs:
  - use

Next, we add a Cluster Role Binding to associate a new non-admin role to our user (jramirez for this example). We created rb-mustrunasnonroot-jramirez.yaml with following content:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rb-mustrunasnonroot-jramirez
  namespace: default
roleRef:
  kind: ClusterRole
  name: role-mustrunasnonroot
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: User
  name: jramirez
  namespace: default

We create both the Cluster Role and Cluster Role Binding to allow jramirez to use the defined Pod Security Policy:

$ kubectl create -f role-mustrunasnonroot.yaml
clusterrole.rbac.authorization.k8s.io/role-mustrunasnonroot created

$ kubectl create -f rb-mustrunasnonroot-jramirez.yaml
rolebinding.rbac.authorization.k8s.io/rb-mustrunasnonroot-jramirez created

Now that we’ve applied this policy, we should delete the default rules (privileged or unprivileged). In this case, the default “ucp:all:privileged-psp-role” was applied.

$ kubectl delete clusterrolebinding ucp:all:privileged-psp-role
clusterrolebinding.rbac.authorization.k8s.io "ucp:all:privileged-psp-role" deleted

We can review jramirez’s permissions to create new pods on the default namespace.

$ kubectl auth can-i create pod --as jramirez
yes

Now we can create a pod using the following manifest from nginx-as-root.yaml:

apiVersion: v1
kind: Pod
metadata:
 name: nginx-as-root
 labels:
   lab: nginx-as-root
spec:
 containers:
 - name: nginx-as-root
   image: nginx:alpine

We’ll now need to login as jramirez using ucp-bundle, our test non-admin user. We can then test deployment to see if it works:

$ kubectl create -f nginx-as-root.yaml
pod/nginx-as-root created

We will get a CreateContainerConfigError because the image doesn’t have any users defined, so the command will try to create a root container, which the policy blocks.

Events:
 Type     Reason     Age                    From               Message
 ----     ------     ----                   ----               -------
 Normal   Scheduled  6m9s                   default-scheduler  Successfully assigned default/nginx-as-root to vmee2-5
 Warning  Failed     4m12s (x12 over 6m5s)  kubelet, vmee2-5   Error: container has runAsNonRoot and image will run as root
 Normal   Pulled     54s (x27 over 6m5s)    kubelet, vmee2-5   Container image "nginx:alpine" already present on machine

What can we do to avoid this? As a best practice,  we should not allow containers with root permissions. However, we can create an Nginx image without root permissions. Here’s a lab image that will work for our purposes (but it’s not production ready):

FROM alpine

RUN addgroup -S nginx \
&& adduser -D -S -h /var/cache/nginx -s /sbin/nologin -G nginx -u 10001 nginx \
&& apk add --update --no-cache nginx \
&& ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log \
&& mkdir /html

COPY nginx.conf /etc/nginx/nginx.conf

COPY html /html

RUN chown -R nginx:nginx /html

EXPOSE 1080

USER 10001

CMD ["nginx", "-g", "pid /tmp/nginx.pid;daemon off;"]

We created a new user nginx to launch the nginx main process under this one (in fact, the nginx installation will provide a special user www-data or nginx, depending on base operating system). We added the user under a special UID because we will use that UID on Kubernetes to specify the user that will be used to launch all containers in our nginx-as-nonroot pod.

You can see that we are using a new nginx.conf. Since we are not using root to start Nginx, we can’t use ports below 1024. Consequently, we exposed port 1080 in the Dockerfile. This is the simplest Nginx config required.

worker_processes  1;

events {
   worker_connections  1024;
}


http {
   include       mime.types;
   default_type  application/octet-stream;
   sendfile        on;
   keepalive_timeout  65;
   server {
       listen       1080;
       server_name  localhost;

       location / {
           root   /html;
           index  index.html index.htm;
       }


       error_page   500 502 503 504  /50x.html;
       location = /50x.html {
           root   /html;
       }

   }

}

We added a simple index.html with just one line:

$ cat html/index.html  
It worked!!

And our pod definition has new security context settings:

apiVersion: v1
kind: Pod
metadata:
 name: nginx-as-nonroot
 labels:
   lab: nginx-as-root
spec:
 containers:
 - name: nginx-as-nonroot
   image: frjaraur/non-root-nginx:1.2
   imagePullPolicy: Always
 securityContext:
   runAsUser: 10001

We specified a UID for all containers in that pod. Therefore, the Nginx main process will run under 10001 UID, the same one specified in image.

If we don’t specify the same UID, we will get permission errors because the main process will use pod-defined settings with different users and Nginx will not be able to manage files:

nginx: [alert] could not open error log file: open() "/var/lib/nginx/logs/error.log" failed (13: Permission denied)
2019/10/17 07:36:10 [emerg] 1#1: mkdir() "/var/tmp/nginx/client_body" failed (13: Permission denied)

If we do not specify any security context, it will use the image-defined UID with user 10001. It will work correctly since the process doesn’t require root access.  

We can go back to the previous situation by deleting the custom Cluster Role Binding we created earlier (rb-mustrunasnonroot-jramirez) and adding the UCP role again:

ucp:all:privileged-psp-role

Create rb-privileged-psp-role.yaml with following content:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: ucp:all:privileged-psp-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: privileged-psp-role
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: system:serviceaccounts
  apiGroup: rbac.authorization.k8s.io

And create the ClusterRoleBinding object using $ kubectl create -f rb-privileged-psp-role.yaml as administrator.

Kubernetes Admission Controllers

Admission Controllers are a feature added to Kubernetes clusters to manage and enforce default resource values or properties and prevent potential risks or misconfigurations. They occur before workload execution, intercepting requests to validate or modify its content. The Admission Controllers gate user interaction with cluster API, applying policies to any actions on Kubernetes.

We can review which Admission Controllers are defined in Docker Enterprise by taking a look at the ucp-kube-apiserver command-line used to start this Kubernetes API Server container. On any of our managers, we can describe container configuration:

$ docker inspect ucp-kube-apiserver --format 'json {{ .Config.Cmd }}'  
json [--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,
NodeRestriction,ResourceQuota,PodNodeSelector,PodSecurityPolicy, UCPAuthorization,CheckImageSigning,UCPNodeSelector

These are the  Admission Controllers deployed with Docker Enterprise Kubernetes:

  • NamespaceLifecycle will manage important namespace features. It will prevent users from removing the default, kube-system and kube-public namespaces, and it will provide the integrity for other namespaces deletion, removing all objects on it prior to deletion (for example). It will also prevent new object creation on a namespace that is in the process of being removed (it can take time because running objects must be removed).
  • LimitRanger will manage default resource requests to pods that don’t specify any. It also verifies that Namespace associated resources doesn’t pass its defined limit.  
  • ServiceAccount will associate pods to a default ServiceAccount if they don’t provide one, and ensure that one exists if it is present on Pod definition. It will also manage API account accessibility.
  • PersistentVolumeLabel will add special labels for regions or zones to ensure that right volumes are mounted per region or zone.
  • DefaultStorageClass will add a default StorageClass when none was declared, and a PersistentVolumeClaim ask for storage.
  • DefaultTolerationSeconds will set default pod toleration values, evicting nodes not ready or unreachable for more than 300 seconds.
  • NodeRestriction will allow only kubelet modifications to its own Node or Pods.
  • ResourceQuota will manage resource quota limits not reached within namespaces.
  • PodNodeSelector provides default node selections within namespaces.
  • PodSecurityPolicy reviews Pod Security Policies to determine if a Pod can be executed or not.
  • UCPAuthorization provides UCP Roles to Kubernetes integration, preventing deletion of system-required cluster roles and bindings. It will also prevents using host paths volumes or privileged containers for non-admins (or non-privileged accounts), even if it is allowed in Pod Security Policies.  
  • CheckImageSigning prevents execution of Pods based on unsigned images by authorized users.
  • UCPNodeSelector manages execution of non-system Kubernetes workloads only on non-mixed UCP hosts.

The last few are Docker designed and created to ensure UCP and Kubernetes integration and improved access and security. These Admission Controllers will be set up during installation. They can’t be disabled since doing so can compromise cluster security, or even break some unnoticeable but important functionalities.

As we learned, Docker Enterprise 3.0 now provides Kubernetes security features by default that will complement and improve users interaction with the cluster, maintaining the highest security environment out-of-box.

To learn more about you can run Kubernetes with Docker Enterprise: