This is an old revision of the document!
Cyclictest is a high resolution test program, written Thomas Gleixner (tglx), maintained by Clark Williams and John Kacur.
Get the latest sources from the git repository, and clone the repository or fetch a released tarball from the archive, untar into a directory of your choice and run make in the source directory. If you want to cross compile, just run make CROSS_COMPILE=<your-compiler-prefix> (for example make CROSS_COMPILE=arm-v4t-linux-gnueabi-).
You can run the resulting binary from there or install it:
#> git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git #> cd rt-tests #> make all #> cp ./cyclictest /usr/bin/ #> cyclictest --help
libnuma is required to build cyclictest. Usually, it's safe to have libnuma installed also in non-numa systems, but if you don't want to install the numa libs (e.g. in embedded environment) then compile with make NUMA=0.
Make sure to be root or use sudo to run cyclictest. Without parameters cyclictest creates one thread with a 1ms interval timer. cyclictest -h provides help text for the various options
cyclictest V 2.00 Usage: cyclictest <options> -a [CPUSET] --affinity Run thread #N on processor #N, if possible, or if CPUSET given, pin threads to that set of processors in round- robin order. E.g. -a 2 pins all threads to CPU 2, but -a 3-5,0 -t 5 will run the first and fifth threads on CPU (0),thread #2 on CPU 3, thread #3 on CPU 4, and thread #5 on CPU 5. -A USEC --aligned=USEC align thread wakeups to a specific offset -b USEC --breaktrace=USEC send break trace command when latency > USEC -B --preemptirqs both preempt and irqsoff tracing (used with -b) -c CLOCK --clock=CLOCK select clock 0 = CLOCK_MONOTONIC (default) 1 = CLOCK_REALTIME -C --context context switch tracing (used with -b) -d DIST --distance=DIST distance of thread intervals in us, default=500 -D --duration=TIME specify a length for the test run. Append 'm', 'h', or 'd' to specify minutes, hours or days. --latency=PM_QOS write PM_QOS to /dev/cpu_dma_latency -E --event event tracing (used with -b) -f --ftrace function trace (when -b is active) -F --fifo=<path> create a named pipe at path and write stats to it -h --histogram=US dump a latency histogram to stdout after the run US is the max latency time to be be tracked in microseconds This option runs all threads at the same priority. -H --histofall=US same as -h except with an additional summary column --histfile=<path> dump the latency histogram to <path> instead of stdout -i INTV --interval=INTV base interval of thread in us default=1000 -I --irqsoff Irqsoff tracing (used with -b) -l LOOPS --loops=LOOPS number of loops: default=0(endless) --laptop Save battery when running cyclictest This will give you poorer realtime results but will not drain your battery so quickly -m --mlockall lock current and future memory allocations -M --refresh_on_max delay updating the screen until a new max latency is hit. Userful for low bandwidth. -n --nanosleep use clock_nanosleep --notrace suppress tracing -N --nsecs print results in ns instead of us (default us) -o RED --oscope=RED oscilloscope mode, reduce verbose output by RED -O TOPT --traceopt=TOPT trace option -p PRIO --priority=PRIO priority of highest prio thread -P --preemptoff Preempt off tracing (used with -b) --policy=NAME policy of measurement thread, where NAME may be one of: other, normal, batch, idle, fifo or rr. --priospread spread priority levels starting at specified value -q --quiet print a summary only on exit -r --relative use relative timer instead of absolute -R --resolution check clock resolution, calling clock_gettime() many times. List of clock_gettime() values will be reported with -X --secaligned [USEC] align thread wakeups to the next full second and apply the optional offset -s --system use sys_nanosleep and sys_setitimer -S --smp Standard SMP testing: options -a -t -n and same priority of all threads --spike=<trigger> record all spikes > trigger --spike-nodes=[num of nodes] These are the maximum number of spikes we can record. The default is 1024 if not specified --smi Enable SMI counting -t --threads one thread per available processor -t [NUM] --threads=NUM number of threads: without NUM, threads = max_cpus without -t default = 1 --tracemark write a trace mark when -b latency is exceeded -T TRACE --tracer=TRACER set tracing function configured tracers: blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop -u --unbuffered force unbuffered output for live processing -U --numa Standard NUMA testing (similar to SMP option) thread data structures allocated from local node -v --verbose output values on stdout for statistics format: n:c:v n=tasknum c=count v=value in us -w --wakeup task wakeup tracing (used with -b) -W --wakeuprt rt task wakeup tracing (used with -b) --dbg_cyclictest print info useful for debugging cyclictest
More information is available by running less ./src/cyclictest/cyclictest.8
. The OSADL Realtime LiveCD project provides a script to plot the latency distribution.
TODO: Run all the tests run in Expected results section of https://rt.wiki.kernel.org/index.php/Cyclictest and update here. We need to rerun the tests because they were run in 2006 on a Pentium III system running 2.6.16 kernel. Things have probably changed a bit now. :)
Each cyclictest-task consist of one or more threads. ps -ce shows only the main-process not the threads of the main-process. ps -eLc | grep cyclic shows the main-process an the containing threads with the correct scheduler class SCHED_FIFO.
#>./cyclictest -t5 -p 80 -n -i 10000 #> ps -cLe | grep cyclic 4764 4764 TS 19 pts/1 00:00:01 cyclictest 4764 4765 FF 120 pts/1 00:00:00 cyclictest 4764 4766 FF 119 pts/1 00:00:00 cyclictest 4764 4767 FF 118 pts/1 00:00:00 cyclictest 4764 4768 FF 117 pts/1 00:00:00 cyclictest 4764 4769 FF 116 pts/1 00:00:00 cyclictest
Don't use the PID of the main-process, but the pid of one of the threads from the main-process. The threads are shown with ps -cLe | grep cyclic.
#> chrt -p 4766 pid 4766's current scheduling policy: SCHED_FIFO pid 4766's current scheduling priority: 79
taskset command is Written by Robert M. Love. SMP operating systems have choices when it comes to scheduling processes: a new or newly rescheduled process can run on any available cpu. However, while it shouldn't matter where a new process runs, an existing process should go back to the same cpu it was running on simply because the cpu may still be caching data that belongs to that process. This is particularly apt to be true if the process is a thread: the other threads in the same program are very likely to have cpu cache of interest to their brethren (though obviously this also diminishes the performance gain that might be seen from multithreading) . For these reasons, scheduling algorithms pay attention to cpu affinity and try to keep it constant. It is possible to force a process to run only on a certain cpu. There are Linux system calls (sched_setaffinity and sched_getaffinity) and a command line “taskset”.
#> taskset -c 3 top #> taskset -p [pid]
make cc -D VERSION_STRING=0.85 -c src/cyclictest/cyclictest.c -Wall -Wno-nonnull -O2 -DNUMA -D_GNU_SOURCE -Isrc/include In file included from src/cyclictest/cyclictest.c:37:0: src/cyclictest/rt_numa.h:23:18: fatal error: numa.h: No such file or directory compilation terminated. make: *** [cyclictest.o] Error 1
Simply install your distribution's numa development package. On Fedora this is numactl-devel, so
su -c 'yum install numactl-devel'
This is only required for building. This will not affect the way the test runs on non-numa machines
Clone one of the following