:original_name: cce_bestpractice_0319.html

.. _cce_bestpractice_0319:

Container Security
==================

Controlling the Pod Scheduling Scope
------------------------------------

The nodeSelector or nodeAffinity is used to limit the range of nodes to which applications can be scheduled, preventing the entire cluster from being threatened due to the exceptions of a single application.

Suggestions on Container Security Configuration
-----------------------------------------------

-  Set the computing resource limits (**request** and **limit**) of a container. This prevents the container from occupying too many resources and affecting the stability of the host and other containers on the same node.
-  Unless necessary, do not mount sensitive host directories to containers, such as **/**, **/boot**, **/dev**, **/etc**, **/lib**, **/proc**, **/sys**, and **/usr**.
-  Do not run the sshd process in containers unless necessary.
-  Unless necessary, it is not recommended that containers and hosts share the network namespace.
-  Unless necessary, it is not recommended that containers and hosts share the process namespace.
-  Unless necessary, it is not recommended that containers and hosts share the IPC namespace.
-  Unless necessary, it is not recommended that containers and hosts share the UTS namespace.
-  Unless necessary, do not mount the sock file of Docker to any container.

Container Permission Access Control
-----------------------------------

When using a containerized application, comply with the minimum privilege principle and properly set securityContext of Deployments or StatefulSets.

-  Configure runAsUser to specify a non-root user to run a container.

-  Configure privileged to prevent containers being used in scenarios where privilege is not required.

-  Configure capabilities to accurately control the privileged access permission of containers.

-  Configure allowPrivilegeEscalation to disable privilege escape in scenarios where privilege escalation is not required for container processes.

-  Configure seccomp to restrict the container syscalls. For details, see `Restrict a Container's Syscalls with seccomp <https://kubernetes.io/docs/tutorials/security/seccomp/>`__ in the official Kubernetes documentation.

-  Configure ReadOnlyRootFilesystem to protect the root file system of a container.

   Example YAML for a Deployment:

   .. code-block::

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: security-context-example
        namespace: security-example
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: security-context-example
            label: security-context-example
        strategy:
          rollingUpdate:
            maxSurge: 25%
            maxUnavailable: 25%
          type: RollingUpdate
        template:
          metadata:
            annotations:
              seccomp.security.alpha.kubernetes.io/pod: runtime/default
            labels:
              app: security-context-example
              label: security-context-example
          spec:
            containers:
              - image: ...
                imagePullPolicy: Always
                name: security-context-example
                securityContext:
                  allowPrivilegeEscalation: false
                  readOnlyRootFilesystem: true
                  runAsUser: 1000
                  capabilities:
                    add:
                    - NET_BIND_SERVICE
                    drop:
                    - all
                volumeMounts:
                  - mountPath: /etc/localtime
                    name: localtime
                    readOnly: true
                  - mountPath: /opt/write-file-dir
                    name: tmpfs-example-001
            securityContext:
              seccompProfile:
                type: RuntimeDefault
            volumes:
              - hostPath:
                  path: /etc/localtime
                  type: ""
                name: localtime
              - emptyDir: {}
                name: tmpfs-example-001

Restricting the Access of Containers to the Management Plane
------------------------------------------------------------

If application containers on a node do not need to access Kubernetes, you can perform the following operations to disable containers from accessing kube-apiserver:

#. Query the container CIDR block and private API server address.

   On the **Clusters** page of the CCE console, click the name of the cluster to find the information on the details page.

#. Log in to each node in the CCE cluster as user **root** and run the following command:

   -  VPC network:

      .. code-block::

         iptables -I OUTPUT -s {container_cidr} -d {Private API server IP} -j REJECT

   -  Container tunnel network:

      .. code-block::

         iptables -I FORWARD -s {container_cidr} -d {Private API server IP} -j REJECT

   *{container_cidr}* indicates the container network of the cluster, for example, 10.0.0.0/16, and *{master_ip}* indicates the IP address of the master node.

   To ensure configuration persistence, you are advised to write the command to the **/etc/rc.local** script.

#. Run the following command in the container to access kube-apiserver and check whether the request is intercepted:

   .. code-block::

      curl -k https://{Private API server IP}:5443