mi

Author / Автор: Sergey Satskiy
NCBI
Publication date / Опубликовано: 07.02.2010
Version / Версия текста: 1.0

What is mi

mi is a Linux command line utility which can identify potential dead locks in multithreaded applications. mi can also collect statistics such as number of successful and failed mutex operations, time spent on acquiring mutex locks etc. mi can also show call stacks for all mutex operations.

mi is able to identify the following conditions:

  • The order of locking two mutexes in one thread is opposite to the order of locking the same mutexes in another thread
  • Unlocking a mutex which was not locked earlier
  • A mutex is left locked after an application has finished
  • The order of unlocking mutexes does not correspond to the reverse order of locking these mutexes
  • Unlocking a mutex which was locked earlier in another thread

mi is written in python and C++ and is distributed as public domain software.

mi can work with usual executable files, with daemons, and with applications which are run via scripts. Re-compilation of an application before testing is not required.

How mi Works

mi is comprised by a shared library written in C++ and two python scripts. The first script (mi.py) loads the shared library (libmi.so) via the LD_PRELOAD mechanism and the library intercepts the pthread mutex lock and unlock function calls. Then the tested program is run. Such an approach allows testing applications without recompilation and applications can be written in any programming language. While the tested application is running information is collected. As soon as the test run is finished the collected information analysis is done with a help of the second python script (statmi.py) and the results are printed on the console.

mi data flow

pthread calls interception

What mi Cannot Do

  • mi analyses run-time calls only. Therefore if some potentially dangerous calls are not done during a test run then mi will not be able to catch them. The solution to this problem is to have test runs with such input data which cover all the execution branches in the tested application.
  • mi cannot analize applications that don’t terminate gracefully, because the analysis script (statmi.py) uses information that is written to the log file during the normal application termination process.
  • If the tesed application uses pools of mutexes then mi most probably will not be able to help. mi cannot meaningfully analyze applications that use pools of mutexes. mi uses addresses to identify mutexes, but the same mutex can be used for many purposes at different times in such applications. mi will likely provide incorrect information about potentially dangerous chains of locked mutexes in these cases.
  • If the tested application uses dlopen() call to load the pthread library and a series of dlsym() calls to get mutex related functions pointers and later uses those pointers to lock and unlock mutexes then mi will not be able to intercept such calls and therefore they will not be analysed.

mi Usage Example

This is how to run mi with an application to test:

satsky.homelinux.com:~> ./mi.py ./test/test-bad-lock-order-elf
Test (improper lock order, elf) t0: m1.lock -> m2.lock -> m3.lock -> m3.unlock -> m2.unlock -> m1.unlock
                                t1: m1.lock -> m3.lock -> m2.lock -> m2.unlock -> m3.unlock -> m1.unlock

A log file is formed as the result of run:

satsky.homelinux.com:~> ll mi.log
-rw-rw-r-- 1 guest guest 4411 2010-01-31 20:54 mi.log

Now you can run the script which analyses the collected information:

swift@satsky.homelinux.com:~> ./statmi.py
Execution environment:
    application: /home/guest/test/test-bad-lock-order-elf
    log file: /home/guest/mi.log
    libpthread.so path: /lib64/libpthread.so.0
    do not print stack trace
Number of threads: 3
Number of mutexes: 4
Successfull operations: 56
Failed operations: 0
Collecting chains and statistics...
Threads legend:
    t0 -> 140209742624528
    t1 -> 140209742620944
    t2 -> 140209732131088
Mutexes legend:
    m0 -> 0x310baf7040
    m1 -> 0x6013c0
    m2 -> 0x601400
    m3 -> 0x601440
--- E000 -- t2: m3 -> m2 -- t1: m2 -> m3
ERROR: potential dead lock detected
Thread t2 lock stack:
    Operation: lock Object: 0x601400(m2) Thread: 140209732131088(t2) Return code: 0 Clocks: 0
    Operation: lock Object: 0x601440(m3) Thread: 140209732131088(t2) Return code: 0 Clocks: 0
    Operation: lock Object: 0x6013c0(m1) Thread: 140209732131088(t2) Return code: 0 Clocks: 0
Thread t1 lock stack:
    Operation: lock Object: 0x601440(m3) Thread: 140209742620944(t1) Return code: 0 Clocks: 0
    Operation: lock Object: 0x601400(m2) Thread: 140209742620944(t1) Return code: 0 Clocks: 0
    Operation: lock Object: 0x6013c0(m1) Thread: 140209742620944(t1) Return code: 0 Clocks: 0
--- E000

Each error message is surrounded by lines which start from three dashes:

--- E000 -- t2: m3 -> m2 -- t1: m2 -> m3
. . .
--- E000

The ‘E’ character means that this is an error. The ‘W’ character is used for warnings. The message type character is followed by a three digit number.

The message header has a short error descripton. The notation:

t2: m3 -> m2 -- t1: m2 -> m3

should be read as follows: thread t2 locked mutex m3 and then locked mutex m2 while thread t1, in turn, locked mutex m2 and then locked mutex m3.

The error details have information about the mutexes lock order for both threads. The lock order is given in the reverse chronological order i.e. the last locked mutex comes first.

Even more detailed information which includes call stacks for each mutex lock operation can be collected if mi is run with the --option stack option key, for example:

satsky.homelinux.com:~mi> ./mi.py --option stack ./test/test-bad-lock-order-elf
Test (improper lock order, elf) t0: m1.lock -> m2.lock -> m3.lock -> m3.unlock -> m2.unlock -> m1.unlock
                                t1: m1.lock -> m3.lock -> m2.lock -> m2.unlock -> m3.unlock -> m1.unlock

And then:

satsky.homelinux.com:~> ./statmi.py
Execution environment:
    application: /home/guest/test/test-bad-lock-order-elf
    log file: /home/guest/mi.log
    libpthread.so path: /lib64/libpthread.so.0
    print stack trace
Number of threads: 3
Number of mutexes: 4
Successfull operations: 56
Failed operations: 0
Collecting chains and statistics...
Threads legend:
    t0 -> 140490701883152
    t1 -> 140490701879568
    t2 -> 140490691389712
Mutexes legend:
    m0 -> 0x310baf7040
    m1 -> 0x6013c0
    m2 -> 0x601400
    m3 -> 0x601440
--- E000 -- t2: m3 -> m2 -- t1: m2 -> m3
ERROR: potential dead lock detected
Thread t2 lock stack:
    Operation: lock Object: 0x601400(m2) Thread: 140490691389712(t2) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x400a57]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
    Operation: lock Object: 0x601440(m3) Thread: 140490691389712(t2) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x400a4d]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
    Operation: lock Object: 0x6013c0(m1) Thread: 140490691389712(t2) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x400a43]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
Thread t1 lock stack:
    Operation: lock Object: 0x601440(m3) Thread: 140490701879568(t1) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x4009fe]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
    Operation: lock Object: 0x601400(m2) Thread: 140490701879568(t1) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x4009f4]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
    Operation: lock Object: 0x6013c0(m1) Thread: 140490701879568(t1) Return code: 0 Clocks: 0
    Backtrace:
        /home/guest/test/test-bad-lock-order-elf [0x4009ea]
        /lib64/libpthread.so.0 [0x310640685a]
        /lib64/libc.so.6 : clone()+0x6d
--- E000

The call stacks for each mutex lock operation are printed in the reverse chronological order similar to the mutex locks stacks.

Dependencies

mi uses the following packages:

  • g++ (to compile and link the shared library which intercepts pthred mutex related calls)
  • python 2.5.2 or 2.6 (http://www.python.org) for mi.py and statmi.py scripts
  • the ldd utility must be available via the PATH variable if you do not provide a full path to the pthread library in the MI_LIBPTHREAD environment variable or in the --pthread option key of the mi.py script.

Installation

To build the shared library which intercepts pthread_mutex_zzz(...) calls type:

make love

mi does not require any special installation. Just copy the mi files (libmi.so, mi.py, statmi.py) to the location where you want to keep it.

You might also want to create symbolic links and update the PATH variable respectively.

mi was tested on Fedora 11 distribution.

To get help type:

./mi.py –help

Download

Version 0.0.1

mi-0.0.1.tar.bz2 (13610 bytes)

Changelog (07-Feb-2010):

  • first public release

Questions and Answers

Q: How do I run mi if the application to test has to be run via a shell script?

A: It is necessary to specify a full path to the libpthread.so library using the --pthread option key at the moment of running the mi.py script, for example:

satsky.homelinux.com:~> ./mi.py --pthread  /lib/libpthread-2.10.2.so  my_prog.sh

Q: Are there any extra steps to test a daemon application?

A: No. Your daemon is run the same as you run a usual application to test. It is however required to wait till the daemon is finished before you run the statmi.py script.

Q: What are the statmi.py return codes?

A: 0 – if there are no errors and no warnings. 1 – if there is at least one warning and there are no errors for the tested application. 2 – if there is at least one error in the tested application. 3 – if there are internal statmi.py errors.

Q: statmi.py prints information about mutexes which are not in my code. Is this a mi bug?

A: No. The run-time environment can use mutexes for internal purposes and mi intercepts operations with those mutexes too. Usually such mutexes have a distinguished range of addresses (see mutex m0 in the mi usage example section) and operations on those mutexes are finished before the first operation on the tested application mutexes.

Afterwords

mi was developed for fun though a few side goals were also persued:

  • to provide a free alternative to a small part of functionality of the commercial product: Intel Thread Checker for Linux
  • to play a bit with interceptions of function calls from dynamic libraries and probably use this experience in other projects

If you like mi and wish to help improve it – you are welcome to!


Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

Разрешается копирование и распространение этой статьи любым способом без внесения изменений, при условии, что это разрешение сохраняется.
Last Updated: February 07, 2010