User Tools

Site Tools


realtime:documentation:howto:howto_rt_tools_cyclictest

This is an old revision of the document!


Cyclictest

Cyclictest is a high resolution test program, written Thomas Gleixner (tglx), maintained by Clark Williams and John Kacur.

Installation

Get the latest sources from the git repository, and clone the repository or fetch a released tarball from the archive, untar into a directory of your choice and run make in the source directory. If you want to cross compile, just run make CROSS_COMPILE=<your-compiler-prefix> (for example make CROSS_COMPILE=arm-v4t-linux-gnueabi-).

You can run the resulting binary from there or install it:

#> git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git 
#> cd rt-tests
#> make all
#> cp ./cyclictest /usr/bin/
#> cyclictest --help

libnuma is required to build cyclictest. Usually, it's safe to have libnuma installed also in non-numa systems, but if you don't want to install the numa libs (e.g. in embedded environment) then compile with make NUMA=0.

Run it

Make sure to be root or use sudo to run cyclictest. Without parameters cyclictest creates one thread with a 1ms interval timer. cyclictest -h provides help text for the various options

cyclictest V 2.00
Usage:
cyclictest <options>
 
-a [CPUSET] --affinity     Run thread #N on processor #N, if possible, or if CPUSET
                           given, pin threads to that set of processors in round-
                           robin order.  E.g. -a 2 pins all threads to CPU 2,
                           but -a 3-5,0 -t 5 will run the first and fifth
                           threads on CPU (0),thread #2 on CPU 3, thread #3
                           on CPU 4, and thread #5 on CPU 5.
-A USEC  --aligned=USEC    align thread wakeups to a specific offset
-b USEC  --breaktrace=USEC send break trace command when latency > USEC
-B       --preemptirqs     both preempt and irqsoff tracing (used with -b)
-c CLOCK --clock=CLOCK     select clock
                           0 = CLOCK_MONOTONIC (default)
                           1 = CLOCK_REALTIME
-C       --context         context switch tracing (used with -b)
-d DIST  --distance=DIST   distance of thread intervals in us, default=500
-D       --duration=TIME   specify a length for the test run.
                           Append 'm', 'h', or 'd' to specify minutes, hours or days.
         --latency=PM_QOS  write PM_QOS to /dev/cpu_dma_latency
-E       --event           event tracing (used with -b)
-f       --ftrace          function trace (when -b is active)
-F       --fifo=<path>     create a named pipe at path and write stats to it
-h       --histogram=US    dump a latency histogram to stdout after the run
                           US is the max latency time to be be tracked in microseconds
                           This option runs all threads at the same priority.
-H       --histofall=US    same as -h except with an additional summary column
         --histfile=<path> dump the latency histogram to <path> instead of stdout
-i INTV  --interval=INTV   base interval of thread in us default=1000
-I       --irqsoff         Irqsoff tracing (used with -b)
-l LOOPS --loops=LOOPS     number of loops: default=0(endless)
         --laptop          Save battery when running cyclictest
                           This will give you poorer realtime results
                           but will not drain your battery so quickly
-m       --mlockall        lock current and future memory allocations
-M       --refresh_on_max  delay updating the screen until a new max
                           latency is hit. Userful for low bandwidth.
-n       --nanosleep       use clock_nanosleep
         --notrace         suppress tracing
-N       --nsecs           print results in ns instead of us (default us)
-o RED   --oscope=RED      oscilloscope mode, reduce verbose output by RED
-O TOPT  --traceopt=TOPT   trace option
-p PRIO  --priority=PRIO   priority of highest prio thread
-P       --preemptoff      Preempt off tracing (used with -b)
         --policy=NAME     policy of measurement thread, where NAME may be one
                           of: other, normal, batch, idle, fifo or rr.
         --priospread      spread priority levels starting at specified value
-q       --quiet           print a summary only on exit
-r       --relative        use relative timer instead of absolute
-R       --resolution      check clock resolution, calling clock_gettime() many
                           times.  List of clock_gettime() values will be
                           reported with -X
         --secaligned [USEC] align thread wakeups to the next full second
                           and apply the optional offset
-s       --system          use sys_nanosleep and sys_setitimer
-S       --smp             Standard SMP testing: options -a -t -n and
                           same priority of all threads
        --spike=<trigger>  record all spikes > trigger
        --spike-nodes=[num of nodes]
                           These are the maximum number of spikes we can record.
                           The default is 1024 if not specified
         --smi             Enable SMI counting
-t       --threads         one thread per available processor
-t [NUM] --threads=NUM     number of threads:
                           without NUM, threads = max_cpus
                           without -t default = 1
         --tracemark       write a trace mark when -b latency is exceeded
-T TRACE --tracer=TRACER   set tracing function
    configured tracers: blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
-u       --unbuffered      force unbuffered output for live processing
-U       --numa            Standard NUMA testing (similar to SMP option)
                           thread data structures allocated from local node
-v       --verbose         output values on stdout for statistics
                           format: n:c:v n=tasknum c=count v=value in us
-w       --wakeup          task wakeup tracing (used with -b)
-W       --wakeuprt        rt task wakeup tracing (used with -b)
         --dbg_cyclictest  print info useful for debugging cyclictest

More information is available by running less ./src/cyclictest/cyclictest.8. The OSADL Realtime LiveCD project provides a script to plot the latency distribution.

Expected Results

TODO: Run all the tests run in Expected results section of https://rt.wiki.kernel.org/index.php/Cyclictest and update here. We need to rerun the tests because they were run in 2006 on a Pentium III system running 2.6.16 kernel. Things have probably changed a bit now. :)

FAQ

ps shows the wrong scheduling class SCHED_OTHER

Each cyclictest-task consist of one or more threads. ps -ce shows only the main-process not the threads of the main-process. ps -eLc | grep cyclic shows the main-process an the containing threads with the correct scheduler class SCHED_FIFO.

#>./cyclictest -t5 -p 80 -n -i 10000
 
#> ps -cLe | grep cyclic
 4764  4764 TS   19 pts/1    00:00:01 cyclictest
 4764  4765 FF  120 pts/1    00:00:00 cyclictest
 4764  4766 FF  119 pts/1    00:00:00 cyclictest
 4764  4767 FF  118 pts/1    00:00:00 cyclictest
 4764  4768 FF  117 pts/1    00:00:00 cyclictest
 4764  4769 FF  116 pts/1    00:00:00 cyclictest

chrt shows the wrong scheduling class SCHED_OTHER

Don't use the PID of the main-process, but the pid of one of the threads from the main-process. The threads are shown with ps -cLe | grep cyclic.

#> chrt -p 4766
pid 4766's current scheduling policy: SCHED_FIFO
pid 4766's current scheduling priority: 79

taskset for CPU affinity

taskset command is Written by Robert M. Love. SMP operating systems have choices when it comes to scheduling processes: a new or newly rescheduled process can run on any available cpu. However, while it shouldn't matter where a new process runs, an existing process should go back to the same cpu it was running on simply because the cpu may still be caching data that belongs to that process. This is particularly apt to be true if the process is a thread: the other threads in the same program are very likely to have cpu cache of interest to their brethren (though obviously this also diminishes the performance gain that might be seen from multithreading) . For these reasons, scheduling algorithms pay attention to cpu affinity and try to keep it constant. It is possible to force a process to run only on a certain cpu. There are Linux system calls (sched_setaffinity and sched_getaffinity) and a command line “taskset”.

#> taskset -c 3 top
#> taskset -p [pid]

Compile failure because numa.h can't be found

make
cc -D VERSION_STRING=0.85 -c src/cyclictest/cyclictest.c -Wall -Wno-nonnull -O2 -DNUMA -D_GNU_SOURCE -Isrc/include
In file included from src/cyclictest/cyclictest.c:37:0:
src/cyclictest/rt_numa.h:23:18: fatal error: numa.h: No such file or directory
compilation terminated.
make: *** [cyclictest.o] Error 1

Simply install your distribution's numa development package. On Fedora this is numactl-devel, so

su -c 'yum install numactl-devel'

This is only required for building. This will not affect the way the test runs on non-numa machines

Current repo

Clone one of the following

rt-tests tarballs

Mailing List

realtime/documentation/howto/howto_rt_tools_cyclictest.1485568978.txt.gz · Last modified: 2017/01/28 02:02 by vedangpatel