A mechansim to isolate CPU topology information in the Linux kernel

The CPU namespace aims to extend the current pool of namespaces in the kernel to isolate the system topology view from applications. The CPU namespace virtualizes the CPU information by maintaining an internal translation from the namespace CPU to the logical CPU in the kernel. The CPU namespace will also enable the existing interfaces interfaces like sys/proc, cgroupfsand sched_set(/get)affinity syscalls to be context aware and divulge information of the topology based on the CPU namespace context that requests information from it.

Today, applications that run on containers enforce their CPU and memory limits, requirements with the help of cgroups. However, many applications legacy or otherwise get the view of the system through sysfs/procfs and allocate resources like number of threads/processes, memory allocation based on that information. This can lead to unexpected running behaviors as well as have a high impact on performance.

The problem is not only limited to the coherency of information. Cloud runtime environments requests for CPU runtime in millicores[1], which translate to using CFS period and quota to limit CPU runtime in cgroups. However, generally, applications operate in terms of threads with little to no cognizance of the millicore limit or its connotation.

In addition to coherency issues, the current way of doing things also pose security and fair use implications on a multi-tenant system such as:

Currently, all of these problems mentioned above can be mitigated with the use of light weight VMs - Kata Containers. However with the use of a CPU namespace, the isolation advantages that are provided by a Kata Container can be achieved without the heaviness of a virtual machine.

Design

The architecture of the CPU namespace is as follows:

The task struct links to the nsproxy which as the name suggests is a pointer proxy for the namespaces that can be attached to it later. One of the proxy pointers is now introduced for the CPU namespace. ...

CPU namespace

The CPU namespace structure contains the following fields.
NOTE: For the sake of this design discussion, consider vCPU as the CPU within the CPU namespace and pCPU as the corresponding translation that Linux as host recognizes and can perform operations upon.

There are also other feilds such as ns_common for the callbacks to interact with namespace and user_ns, however they are irrelevant to the current discussion.

To further explain the design, a sample heirarchy is shown: ...

Assume there are 4 CPU namespaces in the system. The system has 32 CPUs.

Example

System configuration:

Experiment description:

Below is a video example of the experiment above.

Work in progress code

https://github.com/pratiksampat/linux/tree/CPU_Namespace_WIP

Survey proposal RFD

Survey proposal for identification of the problems and state of the art solutions:
https://lore.kernel.org/lkml/fe947175-62f5-c3fa-158c-7be2dd886c0e@linux.ibm.com/T/