This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
realtime:documentation:howto:tools:cpuset:tutorial [2017/05/18 21:49] srodrig1 [Listing Tasks with Proc] |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Cpuset (cset) Tutorial ====== | ||
- | |||
- | Alex Tsariounov <alext@novell.com> + | ||
- | |||
- | Copyright (c) 2009-2011 Novell Inc., cset v1.5.8 + | ||
- | |||
- | Verbatim copying and distribution of this entire article are permitted | ||
- | worldwide, without royalty, in any medium, provided this notice is preserved. | ||
- | |||
- | This tutorial describes basic and advanced usage of the **cset** command to | ||
- | manipulate cpusets on a Linux system. See also the manpages that come with | ||
- | the **cset** command: cset(1), cset-shield(1), cset-set(1), and cset-proc(1) for | ||
- | more details. Additionally, the **cset** command has online help. | ||
- | |||
- | ====== Introduction ====== | ||
- | |||
- | In the Linux kernel, the cpuset facility provides a mechanism for creating | ||
- | logical entities called //cpusets// that encompass definitions of CPUs and NUMA | ||
- | Memory Nodes (if NUMA is available). Cpusets constrain the CPU and Memory | ||
- | placement of a task to only the resources defined within that cpuset. These | ||
- | cpusets can then be arranged into a nested hierarchy visible in the //cpuset// | ||
- | virtual filesystem. Sets of tasks can be assigned to these cpusets to | ||
- | constrain the resources that they use. The tasks can be moved from one cpuset | ||
- | to another to utilize other resources defined in those other cpusets. | ||
- | |||
- | The **cset** command is a Python application that provides a command line front | ||
- | end for the Linux cpusets functionality. Working with cpusets directly can be | ||
- | confusing and slightly complex. The **cset** tool hides that complexity behind | ||
- | an easy-to-use command line interface. | ||
- | |||
- | There are two distinct use cases for **cset**: the basic shielding use case and | ||
- | the //advanced// case of using raw **set** and **proc** subcommands. The basic | ||
- | shielding function is accessed with the **shield** subcommand and described in | ||
- | the next section. Using the raw **set** and **proc** subcommands allows one to | ||
- | set up arbitrarily complex cpusets and is described in the later sections. | ||
- | |||
- | Note that in general, one either uses the **shield** subcommand or a | ||
- | combination of the **set** and **proc** subcommands. One rarely, if ever, uses | ||
- | all of these subcommands together. Doing so will likely become too | ||
- | confusing. Additionally, the **shield** subcommand sets up its required cpusets | ||
- | with exclusively marked CPUs. This can interfere with your cpuset | ||
- | strategy. If you find that you need more functionality for your strategy than | ||
- | **shield** provides, go ahead and transition to using **set** and **proc** | ||
- | exclusively. It is straightforward to implement what **shield** does with a few | ||
- | extra **set** and **proc** subcommands. | ||
- | |||
- | ===== Obtaining Online Help ===== | ||
- | |||
- | For a full list of **cset** subcommands:: | ||
- | # cset help | ||
- | |||
- | For in-depth help on individual subcommands:: | ||
- | # cset help <subcommand> | ||
- | |||
- | For options of individual subcommands:: | ||
- | # cset <subcommand> (-h | --help) | ||
- | |||
- | ====== The Basic Shielding Model ====== | ||
- | |||
- | Although any set up of cpusets can really be described as //shielding,// there | ||
- | is one prevalent shielding model in use that is so common that **cset** has a | ||
- | subcommand that is dedicated to its use. This subcommand is called **shield**. | ||
- | |||
- | The concept behind this model is the use of three cpusets. The //root// cpuset | ||
- | which is always present in all configurations and contains all CPUs. The | ||
- | //system// cpuset which contains CPUs which are used for system tasks. These | ||
- | are the normal tasks that are not //important,// but which need to run on the | ||
- | system. And finally, the //user// cpuset which contains CPUs which are used for | ||
- | //important// tasks. The //user// cpuset is the shield. Only those tasks that | ||
- | are somehow important, usually tasks whose performance determines the overall | ||
- | rating for the machine, are run in the //user// cpuset. | ||
- | |||
- | The **shield** subcommand manages all of these cpusets and lets you define the | ||
- | CPUs and Memory Nodes that are in the //shielded// and //unshielded// sets. The | ||
- | subcommand automatically moves all movable tasks on the system into the | ||
- | //unshielded// cpuset on shield activation, and back into the //root// cpuset on | ||
- | shield tear down. The subcommand then lets you move tasks into and out of the | ||
- | shield. Additionally, you can move special tasks (kernel threads) which | ||
- | normally run in the //root// cpuset into the 'unshielded' set so that your | ||
- | shield will have even less disturbance. | ||
- | |||
- | The **shield** subcommand abstracts the management of these cpusets away from | ||
- | you and provides options that drive how the shield is set up, which tasks are | ||
- | to be shielded and which tasks are not, and status of the shield. In fact, | ||
- | you need not be bothered with the naming of the required cpusets or even where | ||
- | the cpuset filesystem is mounted. **Cset** and the **shield** subcommand takes | ||
- | care of all that. | ||
- | |||
- | If you find yourself needing to define more cpusets for your application, then | ||
- | it is likely that this simple shielding is not a rich enough model for you. | ||
- | In this case, you should transition to using the **set** and **proc** subcommands | ||
- | described in a later section. | ||
- | |||
- | ===== A Simple Shielding Example ===== | ||
- | |||
- | Assume that we have a 4-way machine that is not NUMA. This means there are 4 | ||
- | CPUs at our disposal and there is only one Memory Node available. On such | ||
- | machines, we do not need to specify any memory node parameters to **cset**, it | ||
- | sets up the only available memory node by default. | ||
- | |||
- | Usually, one wants to dedicate as many CPUs to the shield as possible and | ||
- | leave a minimal set of CPUs for normal system processing. The reasoning for | ||
- | this is because the performance of the important tasks will rule the | ||
- | performance of the installation as a whole and these important tasks need as | ||
- | many resources available to them as possible, exclusive of other, unimportant | ||
- | tasks that are running on the system. | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: I use the word //task// to represent either a process or a thread that is | ||
- | running on the system. | ||
- | </WRAP> | ||
- | |||
- | |||
- | ==== Setup and Teardown of the Shield ==== | ||
- | |||
- | To set up a shield of 3 CPUs with 1 CPU left for low priority system | ||
- | processing, issue the following command. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield -c 1-3 | ||
- | cset: --> activating shielding: | ||
- | cset: moving 176 tasks from root into system cpuset... | ||
- | [==================================================]% | ||
- | cset: "system" cpuset of CPUSPEC(0) with 176 tasks running | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | </code> | ||
- | |||
- | This command does a number of things. First, a //user// cpuset is created with | ||
- | what's called a CPUSPEC (CPU specification) from the **-c --cpu** option. This | ||
- | CPUSPEC specifies to use CPUs 1 through 3 inclusively. Next, the command | ||
- | creates a //system// cpuset with a CPUSPEC that is the inverse of the **-c** | ||
- | option for the current machine. On this machine that cpuset will only contain | ||
- | the first CPU, CPU0. Next, all userspace processes running in the //root// | ||
- | cpuset are transfered to the //system// cpuset. This makes all those processes | ||
- | run only on CPU0. The effect of this is that the shield consists of CPUs 1 | ||
- | through 3 and they are now idling. | ||
- | |||
- | Note that the command did not move the kernel threads that are running in the | ||
- | //root// cpuset to the //system// cpuset. This is because you may want these | ||
- | kernel threads to use all available CPUs. If you do not, the you can use the | ||
- | **-k --kthread** option as described below. | ||
- | |||
- | The shield setup command above outputs the information of which cpusets were | ||
- | created and how many tasks are running on each. If you want to see the | ||
- | current status of the shield again, issue this command: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield | ||
- | cset: --> shielding system active with | ||
- | cset: "system" cpuset of CPUSPEC(0) with 176 tasks running | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | </code> | ||
- | |||
- | Which shows us that the shield is set up and that 176 tasks are running in the | ||
- | //system// cpuset--the //unshielded// cpuset. | ||
- | |||
- | It is important to move all possible tasks from the //root// cpuset to the | ||
- | unshielded //system// cpuset because a task's cpuset property is inherited by | ||
- | its children. Since we've moved all running tasks (including init) to the | ||
- | unshielded //system// cpuset, that means that any new tasks that are spawned | ||
- | will **also** run in the unshielded //system// cpuset. | ||
- | |||
- | Some kernel threads can be moved into the unshielded //system// cpuset as well. | ||
- | These are the threads that are not bound to specific CPUs. If a kernel thread | ||
- | is bound to a specific CPU, then it is generally not a good idea to move that | ||
- | thread to the //system// set because at worst it may hang the system and at best | ||
- | it will slow the system down significantly. These threads are usually the IRQ | ||
- | threads on a real time Linux kernel, for example, and you may want to not move | ||
- | these kernel threads into //system//. If you leave them in the //root// cpuset, | ||
- | then they will have access to all CPUs. | ||
- | |||
- | However, if your application demands an even //quieter// shield, then you can | ||
- | move all movable kernel threads into the unshielded 'system' set with the | ||
- | following command. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield -k on | ||
- | cset: --> activating kthread shielding | ||
- | cset: kthread shield activated, moving 70 tasks into system cpuset... | ||
- | [==================================================]% | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | You can see that this moved an additional 70 tasks to the unshielded //system// | ||
- | cpuset. Note that the **-k --kthread on** parameter can be given at the shield | ||
- | creation time as well and you do not need to perform these two steps | ||
- | separately if you know that you will want kernel thread shielding as well. | ||
- | Executing **cset shield** again shows us the current state of the shield. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield | ||
- | cset: --> shielding system active with | ||
- | cset: "system" cpuset of CPUSPEC(0) with 246 tasks running | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | </code> | ||
- | |||
- | You can get a detailed listing of what is running in the shield by specifying | ||
- | either **-s --shield** or **-u --unshield** to the **shield** subcommand and using | ||
- | the verbose flag. You will get output similar to the following. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield --unshield -v | ||
- | cset: "system" cpuset of CPUSPEC(0) with 251 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 1 0 Soth init [5] | ||
- | root 2 0 Soth [kthreadd] | ||
- | root 84 2 Sf50 [IRQ-9] | ||
- | ... | ||
- | alext 31796 31789 Soth less | ||
- | root 32653 25222 Roth python ./cset shield --unshield -v | ||
- | </code> | ||
- | |||
- | Note that I abbreviated the listing; we do have 251 tasks running in the | ||
- | //system// set. The output is self-explanatory; however, the //SPPr// field may | ||
- | need a little explanation. //SPPr// stands for State, Policy and Priority. You | ||
- | can see that the initial two tasks are Stopped and running in timeshare | ||
- | priority, marked as //oth// (for //other//). The [IRQ-9] task is also stopped, | ||
- | but marked at real time FIFO policy with a priority of 50. The last task in | ||
- | the listing is the **cset** command itself and is marked as running. Also note | ||
- | that adding a second **-v --verbose** option will not restrict the output to | ||
- | fit into an 80 character screen. | ||
- | |||
- | Tear down of the shield, stopping the shield in other words, is done with the | ||
- | **-r --reset** option to the **shield** subcommand. When this command is issued, | ||
- | both the //system// and //user// cpusets are deleted and any tasks that are | ||
- | running in both of those cpusets are moved to the 'root' cpuset. Once so | ||
- | moved, all tasks will have access to all resources on the system. For | ||
- | example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield --reset | ||
- | cset: --> deactivating/reseting shielding | ||
- | cset: moving 0 tasks from "/user" user set to root set... | ||
- | cset: moving 250 tasks from "/system" system set to root set... | ||
- | [==================================================]% | ||
- | cset: deleting "/user" and "/system" sets | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | ===== Moving Interesting Tasks Into and Out of the Shield ===== | ||
- | |||
- | Now that we have a shield running, the objective is to run our //important// | ||
- | processes in that shield. These processes can be anything, but usually they | ||
- | are directly related to the purpose of the machine. There are two ways to run | ||
- | tasks in the shield: | ||
- | |||
- | . Exec a process into the shield | ||
- | . Move an already running task into the shield | ||
- | |||
- | ==== Execing a Process into the Shield ==== | ||
- | |||
- | Running a new process in the shield can be done with the **-e --exec** option | ||
- | to the **shield** subcommand. This is the simplest way to get a task to run in | ||
- | the shield. For this example, let's exec a new bash shell into the shield | ||
- | with the following commands. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield -s | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -e bash | ||
- | cset: --> last message, executed args into cpuset "/user", new pid is: 13300 | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -s -v | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 2 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 13300 8583 Soth bash | ||
- | root 13329 13300 Roth python ./cset shield -s -v | ||
- | |||
- | [zuul:cpuset-trunk]# exit | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -s | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | The first command above lists the status of the shield. We see that the | ||
- | shield is defined as CPUs 1 through 3 inclusive and currently there are no | ||
- | tasks running in it. | ||
- | |||
- | The second command execs the bash shell into the shield with the **-e** | ||
- | option. The last message of cset lists the PID of the new process. | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: **cset** follows the tradition of separating the tool options from the | ||
- | command to be execed options with a double dash (**--**). This is not shown in | ||
- | this simple example, but if the command you want to exec also takes options, | ||
- | separate them with the double dash like so: **# cset shield -e mycommand -- -v** | ||
- | The **-v** will be passed to **mycommand**, and not to **cset**. | ||
- | </WRAP> | ||
- | |||
- | The next command lists the status of the shield again. You will note that | ||
- | there are actually two tasks running shielded: our new shell and the **cset** | ||
- | status command itself. Remember that the cpuset property of a task is | ||
- | inherited by its children. Since we ran the new shell in the shield, its | ||
- | child, which is the status command, also ran in the shield. | ||
- | |||
- | <WRAP center round tip 100%> | ||
- | TIP: Execing a shell into the shield is a useful way to experiment with | ||
- | running tasks in the shield since all children of the shell will also run in | ||
- | the shield. | ||
- | </WRAP> | ||
- | |||
- | |||
- | The last command exits the shell after which we request a shield status again | ||
- | and see that once again, it does not contain any tasks. | ||
- | |||
- | You may have noticed in the output above that both the new shell and the | ||
- | status command are running as the root user. This is because **cset** needs to | ||
- | run as root and so all it's children will also run as root. If you need to | ||
- | run a process under a different user and or group, you may use the **--user** | ||
- | and **--group** options for **exec** as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset shield --user=alext --group=users -e bash | ||
- | cset: --> last message, executed args into cpuset "/user", new pid is: 14212 | ||
- | |||
- | alext@zuul> cset shield -s -v | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 2 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | alext 14212 8583 Soth bash | ||
- | alext 14241 14212 Roth python ./cset shield -s -v | ||
- | </code> | ||
- | |||
- | ==== Moving a Running Task into and out of the Shield ==== | ||
- | |||
- | While execing a process into the shield is undoubtably useful, most of the | ||
- | time, you'll want to move already running tasks into and out of the shield. | ||
- | The **cset** shield subcommand includes two options for doing this: | ||
- | **-s/--shield** and **-u/--unshield**. These options require what's called a | ||
- | PIDSPEC (process specification) to also be specified with the **-p/--pid** | ||
- | option. The PIDSPEC defines which tasks get operated on. The PIDSPEC can be | ||
- | a single process ID, a list of process IDs separated by commas, and a list of | ||
- | process ID ranges separated by dashes, groups of which are separated by | ||
- | commas. For example: | ||
- | |||
- | <code bash> | ||
- | --shield --pid 1234 | ||
- | This PIDSPEC argument specifies that PID 1234 be shielded. | ||
- | |||
- | --shield --pid 1234,42,1934,15000,15001,15002 | ||
- | This PIDSPEC argument specifies that this list of PIDs only be moved into the | ||
- | shield. | ||
- | |||
- | --unshield -p 5000,5100,6010-7000,9232 | ||
- | This PIDSPEC argument specifies that PIDs 5000,5100 and 9232 be unshielded | ||
- | (moved out of the shield) along with any existing PID that is in the range | ||
- | 6010 through 7000 inclusive. | ||
- | </code> | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: A range in a PIDSPEC does not have to have tasks running for every | ||
- | number in that range. In fact, it is not even an error if there are no tasks | ||
- | running in that range; none will be moved in that case. The range simply | ||
- | specifies to act on any tasks that have a PID or TID that is within that | ||
- | range. | ||
- | </WRAP> | ||
- | |||
- | |||
- | Use of the appropriate PIDSPEC can thus be handy to move tasks and groups of | ||
- | tasks into and out of the shield. Additionally, there is one more option that | ||
- | can help with multi-threaded processes, and that is the **--threads** flag. If | ||
- | this flag is present in a **shield** or **unshield** command with a PIDSPEC and if | ||
- | any of the task IDs in the PIDSPEC belong to a thread in a process container, | ||
- | then **all** the sibling threads in that process container will get shielded or | ||
- | unshielded as well. This flag provides an easy mechanism to shield/unshield | ||
- | all threads of a process by simply specifying one thread in that process. | ||
- | |||
- | In the following example, we move the current shell into the shield with a | ||
- | range PIDSPEC and back out with the bash variable for the current PID. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# echo $$ | ||
- | 22018 | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -s -p 22010-22020 | ||
- | cset: --> shielding following pidspec: 22010-22020 | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -s -v | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 2 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 3770 22018 Roth python ./cset shield -s -v | ||
- | root 22018 5034 Soth bash | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -u -p $$ | ||
- | cset: --> unshielding following pidspec: 22018 | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset shield -s | ||
- | cset: "user" cpuset of CPUSPEC(1-3) with 0 tasks running | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: Ordinarily, the **shield** option will shield a PIDSPEC only if it is | ||
- | currently running in the //system// set--the unshielded set. The **unshield** | ||
- | option will unshield a PIDSPEC only if it is currently running in the //user// | ||
- | set--the shielded set. If you want to **shield/unshield** a process that | ||
- | happens to be running in the //root// set (not common), then use the **--force** | ||
- | option for these commands. | ||
- | </WRAP> | ||
- | |||
- | |||
- | ====== Full Featured Cpuset Manipulation Commands ====== | ||
- | |||
- | While basic shielding as described above is useful and a common use model for | ||
- | **cset**, there comes a time when more functionality will be desired to | ||
- | implement your strategy. To implement this, **cset** provides two subcommands: | ||
- | **set**, which allows you to manipulate cpusets; and **proc**, which allows you to | ||
- | manipulate processes within those cpusets. | ||
- | |||
- | ===== The Set Subcommand ===== | ||
- | |||
- | In order to do anything with cpusets, you must be able to create, adjust, | ||
- | rename, move and destroy them. The +set+ subcommand allows the management of | ||
- | cpusets in such a manner. | ||
- | |||
- | ==== Creating and Destroying Cpusets with Set ==== | ||
- | |||
- | The basic syntax of **set** for cpuset creation is: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -c 1-3 -s my_cpuset1 | ||
- | cset: --> created cpuset "my_cpuset1" | ||
- | </code> | ||
- | |||
- | This creates a cpuset named //my_cpuset1// with a CPUSPEC of CPU1, CPU2 and | ||
- | CPU3. The CPUSPEC is the same concept as described in the **Setup and | ||
- | Teardown of the Shield** section above. The **set** subcommand also takes a | ||
- | **-m/--mem** option that lets you specify the memory nodes the set will use as | ||
- | well as flags to make the CPUs and MEMs exclusive to the cpuset. If you are | ||
- | on a non-NUMA machine, just leave the **-m** option out and the default memory | ||
- | node 0 will be used. | ||
- | |||
- | Just like with **shield**, you can adjust the CPUs and MEMs with subsequent | ||
- | calls to **set**. If, for example, you wish to adjust the //my_cpuset1// cpuset | ||
- | to only use CPUs 1 and 3 (and omit CPU2), then issue the following command. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -c 1,3 -s my_cpuset1 | ||
- | cset: --> modified cpuset "my_cpuset | ||
- | </code> | ||
- | |||
- | **cset** will then adjust the CPUs that are assigned to the "my_cpuset1" set to | ||
- | only use CPU1 and CPU3. | ||
- | |||
- | To rename a cpuset, use the **-n/--newname** option. For example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -s my_cpuset1 -n super_set | ||
- | cset: --> renaming "/cpusets/my_cpuset1" to "super_set" | ||
- | </code> | ||
- | |||
- | Renames the cpuset called //my_cpuset1// to //super_set//. | ||
- | |||
- | To destroy a cpuset, use the **-d/--destroy** option as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -d super_set | ||
- | cset: --> processing cpuset "super_set", moving 0 tasks to parent "/"... | ||
- | cset: --> deleting cpuset "/super_set" | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | This command destroys the newly created cpuset called //super_set//. When a | ||
- | cpuset is destroyed, all the tasks running in it are moved to the parent | ||
- | cpuset. The //root// cpuset, which always exists and always contains all CPUs, | ||
- | can not be destroyed. You may also give the **--destroy** option a list of | ||
- | cpusets to destroy. | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: The **cset** subcommand creates the cpusets based on a mounted cpuset | ||
- | filesystem. You do not need to know where that filesystem is mounted, | ||
- | although it is easy to figure out (by default it's on ///cpusets//). When you | ||
- | give the **set** subcommand a name for a new cpuset, it is created wherever the | ||
- | cpuset filesystem is mounted at. | ||
- | </WRAP> | ||
- | |||
- | |||
- | If you want to create a cpuset hierarchy, then you must give a path to the | ||
- | **cset** **set** subcommand. This path will always begin with the //root// cpuset, | ||
- | for which the path is /////. For example. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -c 1,3 -s top_set | ||
- | cset: --> created cpuset "top_set" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 3 -s /top_set/sub_set | ||
- | cset: --> created cpuset "/top_set/sub_set" | ||
- | </code> | ||
- | |||
- | These commands created two cpusets: //top_set// and //sub_set//. The //top_set// | ||
- | uses CPU1 and CPU3. It has a subset of 'sub_set' which only uses CPU3. Once | ||
- | you have created a subset with a path, then if the name is unique, you do not | ||
- | have to specify the path in order to affect it. If the name is not unique, | ||
- | then **cset** will complain and ask you to use the path. For example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -c 1,3 -s sub_set | ||
- | cset: --> modified cpuset "sub_set | ||
- | </code> | ||
- | |||
- | This command adds CPU1 to the //sub_set// cpuset for it's use. Note that using | ||
- | the path in this case is optional. | ||
- | |||
- | If you attempt to destroy a cpuset which has sub-cpusets, **cset** will complain | ||
- | and not do it unless you use the **-r/--recurse** and the **--force** options. | ||
- | If you do use **--force**, then all the tasks running in all subsets of the | ||
- | deletion target cpuset will be moved to the target's parent cpuset and all | ||
- | cpusets. | ||
- | |||
- | Moving a cpuset from under a certain cpuset to a different location is | ||
- | currently not implemented and is slated for a later release of **cset**. | ||
- | |||
- | ==== Listing Cpusets with Set ==== | ||
- | |||
- | To list cpusets, use the **set** subcommand with the **-l\--list** option. For | ||
- | example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -l | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 320 1 / | ||
- | one 3 n 0 n 0 1 /one | ||
- | </code> | ||
- | |||
- | This shows that there is currently one cpuset present called //one//. (Of course | ||
- | that there is also the //root// set, which is always present.) The output shows | ||
- | that the //one// cpuset has no tasks running in it. The //root// cpuset has 320 | ||
- | tasks running. The //-X// for //CPUs// and //MEMs// fields denotes whether the CPUs | ||
- | and MEMs in the cpusets are marked exclusive to those cpusets. Note that the | ||
- | //one// cpuset has subsets as indicated by a 1 in the //Subs// field. You can | ||
- | specify a cpuset to list with the **set** subcommand as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -l -s one | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | one 3 n 0 n 0 1 /one | ||
- | two 3 n 0 n 0 1 /one/two | ||
- | </code> | ||
- | |||
- | This output shows that there is a cpuset called //two// in cpuset //one// and it | ||
- | also has subset. You can also ask for a recursive listing as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -l -r | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 320 1 / | ||
- | one 3 n 0 n 0 1 /one | ||
- | two 3 n 0 n 0 1 /one/two | ||
- | three 3 n 0 n 0 0 /one/two/three | ||
- | </code> | ||
- | |||
- | This command lists all cpusets existing on the system since it asks for a | ||
- | recursive listing beginning at the //root// cpuset. Incidentally, should you | ||
- | need to specify the //root// cpuset you can use either **root** or **/** to specify it | ||
- | explicitely--just remember that the //root// cpuset cannot be deleted or modified. | ||
- | |||
- | ===== The Proc Subcommand ===== | ||
- | |||
- | Now that we know how to create, rename and destroy cpusets with the **set** | ||
- | subcommand, the next step is to manage threads and processes in those | ||
- | cpusets. The subcommand to do this is called **proc** and it allows you to exec | ||
- | processes into a cpuset, move existing tasks around existing cpusets, and list | ||
- | tasks running in specified cpusets. For the following examples, let us | ||
- | assume a cpuset setup of two sets as follows: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -l | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 309 2 / | ||
- | two 2 n 0 n 3 0 /two | ||
- | three 3 n 0 n 10 0 /three | ||
- | </code> | ||
- | |||
- | ==== Listing Tasks with Proc ==== | ||
- | |||
- | Operation of the **proc** subcommand follows the same model as the **set** | ||
- | subcommand. For example, to list tasks in a cpuset, you need to use the | ||
- | **-l\--list** option and specify the cpuset by name or, if the name exists | ||
- | multiple times in the cpuset hierarchy, by path. For example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 3 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 16141 4300 Soth bash | ||
- | root 16171 16141 Soth bash | ||
- | root 16703 16171 Roth python ./cset proc -l two | ||
- | </code> | ||
- | |||
- | This output shows us that the cpuset called //two// has CPU2 only attached to it | ||
- | and is running three tasks: two shells and the python command to list it. | ||
- | Note that cpusets are inherited so that if a process is contained in a cpuset, | ||
- | then any children it spawns also run within that set. In this case, the | ||
- | python command to list set //two// was run from a shell already running in set | ||
- | //two//. This can be seen by the PPID (parent process ID) of the python | ||
- | command matching the PID of the shell. | ||
- | |||
- | Additionally, the //SPPr// field needs explanation. //SPPr// stands for State, | ||
- | Policy and Priority. You can see that the initial two tasks are Stopped and | ||
- | running in timeshare priority, marked as //oth// (for //other//). The last task | ||
- | is marked as running, //R// and also at timeshare priority, //oth.// If any of | ||
- | these tasks would have been at real time priority, then the policy would be | ||
- | shown as //f// for FIFO or //r// for round robin, and the priority would be a | ||
- | number from 1 to 99. See below for an example. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s root | head -7 | ||
- | cset: "root" cpuset of CPUSPEC(0-3) with 309 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 1 0 Soth init [5] | ||
- | root 2 0 Soth [kthreadd] | ||
- | root 3 2 Sf99 [migration/0] | ||
- | root 4 2 Sf99 [posix_cpu_timer] | ||
- | </code> | ||
- | |||
- | This output shows the first few tasks in the //root// cpuset. Note that both | ||
- | //init// and //[kthread]// are running at timeshare; however, the //[migration/0]// | ||
- | and //[posix_cpu_timer]// kernel threads are running at real time policy of FIFO | ||
- | and priority of 99. Incidentally, this output is from a system running the | ||
- | real time Linux kernel which runs some kernel threads at real time | ||
- | priorities. And finally, note that you can of course use **cset** as any other | ||
- | Linux tool and include it in pipelines as in the example above. | ||
- | |||
- | Taking a peek into the third cpuset called 'three', we see: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s three | ||
- | cset: "three" cpuset of CPUSPEC(3) with 10 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | alext 16165 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16169 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16170 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16237 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16491 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16492 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16493 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17243 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17244 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17265 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | </code> | ||
- | |||
- | This output shows that a lot of //beagled// tasks are running in this cpuset and | ||
- | it also shows an ellipsis (...) at the end of their listings. If you see this | ||
- | ellipsis, that means that the command was too long to fit onto an 80 character | ||
- | screen. To see the entire commandline, use the **-v\--verbose** flag, as per | ||
- | following. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s three -v | head -4 | ||
- | cset: "three" cpuset of CPUSPEC(3) with 10 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | alext 16165 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg --autostarted --indexing-delay 300 | ||
- | </code> | ||
- | |||
- | ==== Execing Tasks with Proc ==== | ||
- | |||
- | To exec a task into a cpuset, the **proc** subcommand needs to be employed with | ||
- | the **-e\--exec** option. Let's exec a shell into the cpuset named //two// in | ||
- | our set. First we check to see what is running that set: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 0 tasks running | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -s two -e bash | ||
- | cset: --> last message, executed args into cpuset "/two", new pid is: 20955 | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 2 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 20955 19253 Soth bash | ||
- | root 20981 20955 Roth python ./cset proc -l two | ||
- | </code> | ||
- | |||
- | You can see that initially, //two// had nothing running in it. After the | ||
- | completion of the second command, we list //two// again and see that there are | ||
- | two tasks running: the shell which we execed and the python **cset** command | ||
- | that is listing the cpuset. The reason for the second task is that the cpuset | ||
- | property of a running task is inherited by all its children. Since we | ||
- | executed the listing command from the new shell which was bound to cpuset | ||
- | //two//, the resulting process for the listing is also bound to cpuset //two//. | ||
- | Let's test that by just running a new shell with no prefixed **cset** command. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# bash | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 3 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 20955 19253 Soth bash | ||
- | root 21118 20955 Soth bash | ||
- | root 21147 21118 Roth python ./cset proc -l two | ||
- | </code> | ||
- | |||
- | Here again we see that the second shell, PID 21118, has a parent PID of 20955 | ||
- | which is the first shell. Both shells as well as the listing command are | ||
- | running in the //two// cpuset. | ||
- | |||
- | <WRAP center round box 100%> | ||
- | NOTE: **cset** follows the tradition of separating the tool options from the | ||
- | command to be execed options with a double dash (**--**). This is not shown in | ||
- | this simple example, but if the command you want to exec also takes options, | ||
- | separate them with the double dash like so: **# cset proc -s myset -e mycommand | ||
- | -- -v** The **-v** will be passed to **mycommand**, and not to **cset**. | ||
- | </WRAP> | ||
- | |||
- | |||
- | <WRAP center round tip 100%> | ||
- | TIP: Execing a shell into a cpuset is a useful way to experiment with running | ||
- | tasks in that cpuset since all children of the shell will also run in the same | ||
- | cpuset. | ||
- | </WRAP> | ||
- | |||
- | |||
- | Finally, if you misspell the command to be execed, the result may be | ||
- | puzzling. For example: | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -s two -e blah-blah | ||
- | cset: --> last message, executed args into cpuset "/two", new pid is: 21655 | ||
- | cset: **> [Errno 2] No such file or directory | ||
- | </code> | ||
- | |||
- | The result is no new process even though a new PID is output. The reason for | ||
- | the message is of course that the **cset** process forked in preparation for | ||
- | exec, but the command **blah-blah** was not found in order to exec it. | ||
- | |||
- | ==== Moving Tasks with Proc ==== | ||
- | |||
- | Although the ability to exec a task into a cpuset is fundamental, you will | ||
- | most likely be moving tasks between cpusets more often. Moving tasks is | ||
- | accomplished with the +-m/\--move+ and +-p/\--pid+ options to the +proc+ | ||
- | subcommand of *cset*. The move option tells the +proc+ subcommand that a task | ||
- | move is requested. The +-p/\--pid+ option takes an argument called a PIDSPEC | ||
- | (PID Specification). The PIDSPEC defines which tasks get operated on. | ||
- | |||
- | The PIDSPEC can be a single process ID, a list of process IDs separated by | ||
- | commas, and a list of process ID ranges also separated by commas. For | ||
- | example: | ||
- | |||
- | <code bash> | ||
- | +\--move --pid 1234+:: | ||
- | This PIDSPEC argument specifies that task 1234 be moved. | ||
- | |||
- | +\--move --pid 1234,42,1934,15000,15001,15002+:: | ||
- | This PIDSPEC argument specifies that this list of tasks only be moved. | ||
- | |||
- | +\--move --pid 5000,5100,6010-7000,9232+:: | ||
- | This PIDSPEC argument specifies that tasks 5000,5100 and 9232 be moved along | ||
- | with any existing task that is in the range 6010 through 7000 inclusive. | ||
- | </code> | ||
- | |||
- | NOTE: A range in a PIDSPEC does not have to have running tasks for every | ||
- | number in that range. In fact, it is not even an error if there are no tasks | ||
- | running in that range; none will be moved in that case. The range simply | ||
- | specifies to act on any tasks that have a PID or TID that is within that | ||
- | range. | ||
- | |||
- | In the following example, we move the current shell into the cpuset named | ||
- | 'two' with a range PIDSPEC and back out to the 'root' cpuset with the bash | ||
- | variable for the current PID. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 0 tasks running | ||
- | |||
- | [zuul:cpuset-trunk]# echo $$ | ||
- | 19253 | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -m -p 19250-19260 -t two | ||
- | cset: moving following pidspec: 19253 | ||
- | cset: moving 1 userspace tasks to /two | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 2 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 19253 16447 Roth bash | ||
- | root 29456 19253 Roth python ./cset proc -l -s two | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -m -p $$ -t root | ||
- | cset: moving following pidspec: 19253 | ||
- | cset: moving 1 userspace tasks to / | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -l -s two | ||
- | cset: "two" cpuset of CPUSPEC(2) with 0 tasks running | ||
- | </code> | ||
- | |||
- | Use of the appropriate PIDSPEC can thus be handy to move tasks and groups of | ||
- | tasks. Additionally, there is one more option that can help with | ||
- | multi-threaded processes, and that is the +\--threads+ flag. If this flag is | ||
- | present in a +proc+ move command with a PIDSPEC and if any of the task IDs in | ||
- | the PIDSPEC belongs to a thread in a process container, then *all* the sibling | ||
- | threads in that process container will also get moved. This flag provides an | ||
- | easy mechanism to move all threads of a process by simply specifying one | ||
- | thread in that process. In the following example, we move all the threads | ||
- | running in cpuset 'three' to cpuset 'two' by using the +\--threads+ flag. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set two three | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | two 2 n 0 n 0 0 /two | ||
- | three 3 n 0 n 10 0 /three | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -l -s three | ||
- | cset: "three" cpuset of CPUSPEC(3) with 10 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | alext 16165 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16169 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16170 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16237 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16491 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16492 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16493 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17243 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17244 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 27133 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -m -p 16165 --threads -t two | ||
- | cset: moving following pidspec: 16491,16493,16492,16170,16165,16169,27133,17244,17243,16237 | ||
- | cset: moving 10 userspace tasks to /two | ||
- | [==================================================]% | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset set two three | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | two 2 n 0 n 10 0 /two | ||
- | three 3 n 0 n 0 0 /three | ||
- | </code> | ||
- | |||
- | ==== Moving All Tasks from one Cpuset to Another ==== | ||
- | |||
- | There is a special case for moving all tasks currently running in one cpuset | ||
- | to another. This can be a common use case, and when you need to do it, | ||
- | specifying a PIDSPEC with +-p+ is not necessary 'so long as' you use the | ||
- | +-f/\--fromset+ *and* the +-t/\--toset+ options. | ||
- | |||
- | In the following example, we move all 10 +beagled+ threads back to cpuset | ||
- | 'three' with this method. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l two three | ||
- | cset: "two" cpuset of CPUSPEC(2) with 10 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | alext 16165 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16169 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16170 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16237 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16491 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16492 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 16493 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17243 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 17244 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | alext 27133 1 Soth beagled /usr/lib64/beagle/BeagleDaemon.exe --bg -... | ||
- | cset: "three" cpuset of CPUSPEC(3) with 0 tasks running | ||
- | |||
- | [zuul:cpuset-trunk]# cset proc -m -f two -t three | ||
- | cset: moving all tasks from two to /three | ||
- | cset: moving 10 userspace tasks to /three | ||
- | [==================================================]% | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset set two three | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | two 2 n 0 n 0 0 /two | ||
- | three 3 n 0 n 10 0 /three | ||
- | </code> | ||
- | |||
- | ==== Moving Kernel Threads with Proc ==== | ||
- | |||
- | Kernel threads are special and *cset* detects tasks that are kernel threads | ||
- | and will refuse to move them unless you also add a +-k/\--kthread+ option to | ||
- | your +proc+ move command. Even if you include +-k+, *cset* will 'still' | ||
- | refuse to move the kernel thread if they are bound to specific CPUs. The | ||
- | reason for this is system protection. | ||
- | |||
- | A number of kernel threads, especially on the real time Linux kernel, are | ||
- | bound to specific CPUs and depend on per-CPU kernel variables. If you move | ||
- | these threads to a different CPU than what they are bound to, you risk at best | ||
- | that the system will become horribly slow, and at worst a system hang. If you | ||
- | still insist to move those threads (after all, *cset* needs to give the | ||
- | knowledgeable user access to the keys), then you need to use the +--force+ | ||
- | option additionally. | ||
- | |||
- | WARNING: Overriding a task move command with +--force+ can have dire | ||
- | consequences for the system. Please be sure of the command before you force | ||
- | it. | ||
- | |||
- | In the following example, we move all unbound kernel threads running in the | ||
- | 'root' cpuset to the cpuset named 'two' by using the +-k+ option. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -k -f root -t two | ||
- | cset: moving all kernel threads from / to /two | ||
- | cset: moving 70 kernel threads to: /two | ||
- | cset: --> not moving 76 threads (not unbound, use --force) | ||
- | [==================================================]% | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | You will note that we used the fromset->toset facility of the +proc+ | ||
- | subcommand and we only specified the +-k+ option (not the +-m+ option). This | ||
- | has the effect of moving all kernel threads only. | ||
- | |||
- | Note that only 70 actual kernel threads were moved and 76 were not. The | ||
- | reason that 76 kernel threads were not moved was because they are bound to | ||
- | specific CPUs. Now, let's move those kernel threads back to 'root'. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -k -f two -t root | ||
- | cset: moving all kernel threads from /two to / | ||
- | cset: ** no task matched move criteria | ||
- | cset: **> kernel tasks are bound, use --force if ok | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l -s two | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | two 2 n 0 n 70 0 /two | ||
- | </code> | ||
- | |||
- | Ah! What's this? *Cset* refused to move the kernel threads back to 'root' | ||
- | because it says that they are "bound." Let's check this with the Linux | ||
- | taskset command. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -l -s two | head -5 | ||
- | cset: "two" cpuset of CPUSPEC(2) with 70 tasks running | ||
- | USER PID PPID SPPr TASK NAME | ||
- | -------- ----- ----- ---- --------- | ||
- | root 2 0 Soth [kthreadd] | ||
- | root 55 2 Soth [khelper] | ||
- | |||
- | [zuul:cpuset-trunk]# taskset -p 2 | ||
- | pid 2's current affinity mask: 4 | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l -s two | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | two 2 n 0 n 70 0 /two | ||
- | </code> | ||
- | |||
- | Of course, since the cpuset named 'two' only has CPU2 assigned to it, once we | ||
- | moved the unbound kernel threads from 'root' to 'two', their affinity masks | ||
- | got automatically changed to only use CPU2. This is evident from the | ||
- | +taskset+ output which is a hex value. To really move these threads back to | ||
- | 'root', we need to force the move as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -k -f two -t root --force | ||
- | cset: moving all kernel threads from /two to / | ||
- | cset: moving 70 kernel threads to: / | ||
- | [==================================================]% | ||
- | cset: done | ||
- | </code> | ||
- | |||
- | ===== Destroying Tasks ===== | ||
- | |||
- | There actually is no *cset* subcommand or option to destroy tasks--it's not | ||
- | really needed. Tasks exist and are accessible on the system as normal, even | ||
- | if they happen to be running in one cpuset or another. To destroy tasks, use | ||
- | the usual ^C method or by using the *kill(1)* command. | ||
- | |||
- | ====== Implementing "Shielding" with Set and Proc ====== | ||
- | |||
- | With the preceding material on the +set+ and +proc+ subcommands, we now have | ||
- | the background to implement the basic shielding model, just like the +shield+ | ||
- | subcommand. | ||
- | |||
- | One may pose the question as to why we want to do this, especially since | ||
- | +shield+ already does it? The answer is that sometimes one needs more | ||
- | functionality than +shield+ has to implement one's shielding strategy. In | ||
- | those case you need to first stop using +shield+ since that subcommand will | ||
- | interfere with the further application of +set+ and +proc+; however, you will | ||
- | still need to implement the functionality of +shield+ in order to implement | ||
- | successful shielding. | ||
- | |||
- | Remember from the above sections describing +shield+, that shielding has at | ||
- | minimum three cpusets: 'root', which is always present and contains all CPUs; | ||
- | 'system' which is the "non-shielded" set of CPUs and runs unimportant system | ||
- | tasks; and 'user', which is the "shielded" set of CPUs and runs your important | ||
- | tasks. Remember also that +shield+ moves all movable tasks into 'system' | ||
- | and, optionally, moves unbound kernel threads into 'system' as well. | ||
- | |||
- | We start first by creating the 'system' and 'user' cpusets as follows. We | ||
- | assume that the machine is a four-CPU machine without NUMA memory features. | ||
- | The 'system' cpuset should hold only CPU0 while the 'user' cpuset should hold | ||
- | the rest of the CPUs. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -c 0 -s system | ||
- | cset: --> created cpuset "system" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 1-3 -s user | ||
- | cset: --> created cpuset "user" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 333 2 / | ||
- | user 1-3 n 0 n 0 0 /user | ||
- | system 0 n 0 n 0 0 /system | ||
- | </code> | ||
- | |||
- | Now, we need to move all running user processes into the 'system' cpuset. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -m -f root -t system | ||
- | cset: moving all tasks from root to /system | ||
- | cset: moving 188 userspace tasks to /system | ||
- | [==================================================]% | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 146 2 / | ||
- | user 1-3 n 0 n 0 0 /user | ||
- | system 0 n 0 n 187 0 /system | ||
- | </code> | ||
- | |||
- | We now have the basic shielding set up. Since all userspace tasks are running | ||
- | in 'system', anything that is spawned from them will also run in 'system'. | ||
- | The 'user' cpuset has nothing running in it unless you put tasks there with | ||
- | the +proc+ subcommand as described above. If you also want to move movable | ||
- | kernel threads from 'root' to 'system' (in order to achieve a form of | ||
- | "interrupt shielding" on a real time Linux kernel for example), you would | ||
- | execute the following command as well. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -k -f root -t system | ||
- | cset: moving all kernel threads from / to /system | ||
- | cset: moving 70 kernel threads to: /system | ||
- | cset: --> not moving 76 threads (not unbound, use --force) | ||
- | [==================================================]% | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 76 2 / | ||
- | user 1-3 n 0 n 0 0 /user | ||
- | system 0 n 0 n 257 0 /system | ||
- | </code> | ||
- | |||
- | At this point, you have achieved the simple shielding model that the +shield+ | ||
- | subcommand provides. You can now add other cpuset definitions to expand your | ||
- | shielding strategy beyond that simple model. | ||
- | |||
- | ====== Implementing Hierarchy with Set and Proc ====== | ||
- | |||
- | One popular extended "shielding" model is based on hierarchical cpusets, each | ||
- | with diminishing numbers of CPUs. This model is used to create "priority | ||
- | cpusets" that allow assignment of CPU resources to tasks based on some | ||
- | arbitrary priority definition. The idea being that a higher priority task | ||
- | will get access to more CPU resources than a lower priority task. | ||
- | |||
- | The example provided here once again assumes a machine with four CPUs and no | ||
- | NUMA memory features. This base serves to illustrate the point well; however, | ||
- | note that if your machine has (many) more CPUs, then strategies such as this | ||
- | and others get more interesting. | ||
- | |||
- | We define a shielding set up as in the previous section where we have a | ||
- | 'system' cpuset with just CPU0 that takes care of "unimportant" system tasks. | ||
- | One usually requires this type of cpuset since it forms the basis of | ||
- | shielding. We modify the strategy to not use a 'user' cpuset, instead we | ||
- | create a number of new cpusets each holding one more CPU than the other. | ||
- | These cpusets will be called 'prio_low' with one CPU, 'prio_med' with two | ||
- | CPUs, 'prio_high' with three CPUs, and 'prio_all' with all CPUs. | ||
- | |||
- | NOTE: One may ask, why create a 'prio_all' with all CPUs when that is | ||
- | substantially the definition of the 'root' cpuset? The answer is that it is | ||
- | best to keep a separation between the 'root' cpuset and everything else, even | ||
- | if a particular cpuset duplicates 'root' exactly. Usually, one builds | ||
- | automation on top of a cpuset strategy. In these cases, it is best to avoid | ||
- | using invariant names of cpusets, such as 'root' for example, in this | ||
- | automation. | ||
- | |||
- | All of these 'prio_*' cpusets can be created under root, in a flat way; | ||
- | however, it is advantageous to create them as a hierarchy. The reasoning for | ||
- | this is twofold: first, if a cpuset is destroyed, all its tasks are moved to | ||
- | its parent; second, one can use exclusive CPUs in a hierarchy. | ||
- | |||
- | There is a planned addition to the +proc+ subcommand that will allow moving a | ||
- | specified PIDSPEC of tasks running in a specified cpuset to its parent. This | ||
- | addition will ease the automation burden. | ||
- | |||
- | If a cpuset has CPUs that are exclusive to it, then other cpusets may not make | ||
- | use of those CPUs unless they are children of that cpuset. This has more | ||
- | relevance to machines with many CPUs and more complex strategies. | ||
- | |||
- | Now, we start with a clean slate and build the appropriate cpusets as | ||
- | follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -r | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 344 0 / | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 0-3 prio_all | ||
- | cset: --> created cpuset "prio_all" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 1-3 /prio_all/prio_high | ||
- | cset: --> created cpuset "/prio_all/prio_high" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 2-3 /prio_all/prio_high/prio_med | ||
- | cset: --> created cpuset "/prio_all/prio_high/prio_med" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 3 /prio_all/prio_high/prio_med/prio_low | ||
- | cset: --> created cpuset "/prio_all/prio_high/prio_med/prio_low" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -c 0 system | ||
- | cset: --> created cpuset "system" | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l -r | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 344 2 / | ||
- | system 0 n 0 n 0 0 /system | ||
- | prio_all 0-3 n 0 n 0 1 /prio_all | ||
- | prio_high 1-3 n 0 n 0 1 /prio_all/prio_high | ||
- | prio_med 2-3 n 0 n 0 1 /prio_all/prio_high/prio_med | ||
- | prio_low 3 n 0 n 0 0 /prio_all/pr...rio_med/prio_low | ||
- | </code> | ||
- | |||
- | NOTE: We used the +-r/\--recurse+ switch to list all the sets in the last | ||
- | command above. If we had not, then the 'prio_med' and 'prio_low' cpusets | ||
- | would not have been listed. | ||
- | |||
- | The strategy is now implemented and we now move all userspace tasks as well as | ||
- | all movable kernel threads into the 'system' cpuset to activate the shield. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset proc -m -k -f root -t system | ||
- | cset: moving all tasks from root to /system | ||
- | cset: moving 198 userspace tasks to /system | ||
- | cset: moving 70 kernel threads to: /system | ||
- | cset: --> not moving 76 threads (not unbound, use --force) | ||
- | [==================================================]% | ||
- | cset: done | ||
- | |||
- | [zuul:cpuset-trunk]# cset set -l -r | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 76 2 / | ||
- | system 0 n 0 n 268 0 /system | ||
- | prio_all 0-3 n 0 n 0 1 /prio_all | ||
- | prio_high 1-3 n 0 n 0 1 /prio_all/prio_high | ||
- | prio_med 2-3 n 0 n 0 1 /prio_all/prio_high/prio_med | ||
- | prio_low 3 n 0 n 0 0 /prio_all/pr...rio_med/prio_low | ||
- | </code> | ||
- | |||
- | The shield is now active. Since the 'prio_*' cpuset names are unique, one can | ||
- | assign tasks to them either via either their simple name, or their full path | ||
- | (as described in the +proc+ section above). | ||
- | |||
- | You may have noted that there is an ellipsis in the path of the 'prio_low' | ||
- | cpuset in the listing above. This is done in order to fit the output onto an | ||
- | 80 character screen. If you want to see the entire line, then you need to use | ||
- | the +-v/\--verbose+ flag as follows. | ||
- | |||
- | <code bash> | ||
- | [zuul:cpuset-trunk]# cset set -l -r -v | ||
- | cset: | ||
- | Name CPUs-X MEMs-X Tasks Subs Path | ||
- | ------------ ---------- - ------- - ----- ---- ---------- | ||
- | root 0-3 y 0 y 76 2 / | ||
- | system 0 n 0 n 268 0 /system | ||
- | prio_all 0-3 n 0 n 0 1 /prio_all | ||
- | prio_high 1-3 n 0 n 0 1 /prio_all/prio_high | ||
- | prio_med 2-3 n 0 n 0 1 /prio_all/prio_high/prio_med | ||
- | prio_low 3 n 0 n 0 0 /prio_all/prio_high/prio_med/prio_low | ||
- | </code> | ||
- | |||
- | ====== Using Shortcuts ====== | ||
- | |||
- | The commands listed in the previous sections always used all the required | ||
- | options. *Cset* however does have a shortcut facility that will execute | ||
- | certain commands without specifying all options. An effort has been made to | ||
- | do this with the "principle of least surprise." This means that if you do not | ||
- | specify options, but do specify parameters, then the outcome of the command | ||
- | should be intuitive. As much as possible. | ||
- | |||
- | Using shortcuts is of course not necessary. In fact, you can not only not use | ||
- | shortcuts, but you can use long options instead of short, in case you really | ||
- | enjoy typing... All kidding aside, using long options and not using shortcuts | ||
- | does have a use case: when you write a script intended to be self-documenting, | ||
- | or perhaps when you generate *cset* documentation. | ||
- | |||
- | To begin, the subcommands +shield+, +set+ and +proc+ can themselves be | ||
- | shortened to the fewest number of characters that are unambiguous. For | ||
- | example, the following commands are identical: | ||
- | |||
- | <code bash> | ||
- | # cset shield -s -p 1234 <--> # cset sh -s -p 1234 | ||
- | # cset set -c 1,3 -s newset <--> # cset se -c 1,3 -s newset | ||
- | # cset proc -s newset -e bash <--> # cset p -s newset -e bash | ||
- | </code> | ||
- | |||
- | Note that +proc+ can be shortened to just +p+, while +shield+ and +set+ need | ||
- | two letters to disambiguate. | ||
- | |||
- | ===== Shield Subcommand Shortcuts ===== | ||
- | |||
- | The +shield+ subcommand supports two areas with shortcuts: the case when there | ||
- | is no options given where to 'shield' is the common use case, and making the | ||
- | +-p/\--pid+ option 'optional' for the +-s/\--shield+ and +-u/\--unshield+ | ||
- | options. | ||
- | |||
- | For the common use case of actually 'shielding' either a PIDSPEC or execing a | ||
- | command into the shield, the following *cset* commands are equivalent. | ||
- | |||
- | <code bash> | ||
- | # cset shield -s -p 1234,500-649 <--> # cset sh 1234,500-649 | ||
- | # cset shield -s -e bash <--> # cset sh bash | ||
- | </code> | ||
- | |||
- | When using the +-s+ or +-u+ shield/unshield options, it is optional to use the | ||
- | +-p+ option to specify a PIDSPEC. For example: | ||
- | |||
- | <code bash> | ||
- | # cset shield -s -p 1234 <--> # cset sh -s 1234 | ||
- | # cset shield -u -p 1234 <--> # cset sh -u 1234 | ||
- | </code> | ||
- | |||
- | |||
- | ===== Set Subcommand Shortcuts ===== | ||
- | |||
- | The +set+ subcommand has a limited number of shortcuts. Basically, the | ||
- | +-s/\--set+ option is optional in most cases and the +-l/\--list+ option is | ||
- | also optional if you want to list sets. For example, these commands are | ||
- | equivalent. | ||
- | |||
- | <code bash> | ||
- | # cset set -l -s myset <--> # cset se -l myset | ||
- | # cset se -l myset <--> # cset se myset | ||
- | |||
- | # cset set -c 1,2,3 -s newset <--> # cset se -c 1,2,3 newset | ||
- | # cset set -d -s newset <--> # cset se -d newset | ||
- | |||
- | # cset set -n newname -s oldname <--> # cset se -n newname oldname | ||
- | </code> | ||
- | |||
- | In fact, if you want to apply either the list or the destroy options to | ||
- | multiple cpusets with one *cset* command, you'll need to not use the +-s+ | ||
- | option. For example: | ||
- | |||
- | <code bash> | ||
- | # cset se -d myset yourset ourset | ||
- | --> destroys cpusets: myset, yourset and ourset | ||
- | |||
- | # cset se -l prio_high prio_med prio_low | ||
- | --> lists only cpusets prio_high, prio_med and prio_low | ||
- | --> the -l is optional in this case since list is default | ||
- | </code> | ||
- | |||
- | ===== Proc Subcommand Shortcuts ===== | ||
- | |||
- | For the +proc+ subcommand, the +-s+, +-t+ and +-f+ options to specify the | ||
- | cpuset, the origination cpuset and the destination cpuset, can sometimes be | ||
- | optional. For example, the following commands are equivalent. | ||
- | |||
- | <code bash> | ||
- | To list tasks in cpusets: | ||
- | # cset proc -l -s myset \ | ||
- | # cset proc -l -f myset --> # cset p -l myset | ||
- | # cset proc -l -t myset / | ||
- | |||
- | # cset p -l myset <--> # cset p myset | ||
- | |||
- | # cset proc -l -s one two <--> # cset p -l one two | ||
- | # cset p -l one two <--> # cset p one two | ||
- | |||
- | To exec a process into a cpuset: | ||
- | # cset proc -s myset -e bash <--> # cset p myset -e bash | ||
- | </code> | ||
- | |||
- | Movement of tasks into and out of cpusets have the following shortcuts. | ||
- | |||
- | <code bash> | ||
- | To move a PIDSPEC into a cpuset: | ||
- | # cset proc -m -p 4242,4243 -s myset <--> # cset p -m 4242,4243 myset | ||
- | # cset proc -m -p 12 -t myset <--> # cset p -m 12 myset | ||
- | |||
- | To move all tasks from one cpuset to another: | ||
- | # cset proc -m -f set1 -t set2 \ | ||
- | # cset proc -m -s set1 -t set2 --> # cset p -m set1 set2 | ||
- | # cset proc -m -f set1 -s set2 / | ||
- | </code> | ||
- | |||
- | |||
- | ====== What To Do if There are Problems ====== | ||
- | |||
- | If you encounter problems with the *cset* application, the best option is to | ||
- | log a bug with the *cset* bugzilla instance found here: | ||
- | |||
- | https://github.com/lpechacek/cpuset/issues | ||
- | |||
- | If you are using *cset* on a supported operating system such as SLES or SLERT | ||
- | from Novell, then please use that bugzilla instead at: | ||
- | |||
- | https://bugzilla.suse.com | ||
- | |||
- | If the problem is repeatable, there is an excellent chance that it will get | ||
- | fixed quickly. Also, *cset* contains a logging facility that is invaluable | ||
- | for the developers to diagnose problems. To create a log of a run, use the | ||
- | +-l/\--log+ option with a filename as an argument to the main *cset* | ||
- | application. For example. | ||
- | |||
- | <code bash> | ||
- | # cset -l logfile.txt set -n newname oldname | ||
- | </code> | ||
- | |||
- | That command saves a lot of debugging information in the 'logfile.txt' file. | ||
- | Please attach this file to the bug. | ||