Thursday, January 20, 2011

User-space C++ application tracing on Fedora using Systemtap

Ever since DTrace came out I realized how much that was what I really wanted to do whenever my program was doing something other than what I assumed it was doing. "Tracing" -- I want that.

Debugging is fine; but you need to have debug symbols, you need to pick the right spot in the code to break at, you have to keep manual track if you're tracing a long-running execution, and real-time applications may not play nice when paused for arbitrary periods. Often I would want to not follow a thread of execution up and down its call stack -- rather I'd be interested in the "cross-cutting concerns" -- how a certain variable changes over time.

DTrace will never be coming to my favorite platform (GNU/Linux), but there is a similar library called SystemTap for Linux, which has had the Linux kernel instrumented for a while, but just recent has allowed tracing in user-space programs. I think you will need Fedora 14 to run user-space traces; at least that is what I am using. With the appropriate setup you can start tracing your own C++ applications without modifying the source.

SystemTap uses "static" (preprocessor macro-based source code) probes that are compatible with DTrace; so if you decide to mark-up your source with these macros (as Java, Python, MySQL, et al. are) you can use DTrace or SystemTap depending on your platform. These probes are turned into no-ops when not in use, so they are very fast. However the usage that I am interested in is ad-hoc tracing for the purpose of debugging or simple profiling. Thankfully for this case all you need is debug symbols.

Once you have SystemTap installed, and your program compiled with symbols, you can write your custom probes. The syntax of the probes is beyond our scope, but you can read about it here. Instead I will post a simple "hello world" demo as follows:


#include < iostream >
#include < vector >

using namespace std;

namespace Baz
{
struct Foo
{
int i; int n; float f;

Foo() : i (0), n (0), f (3.14) {}
Foo(int a, float b) : i (a), n (0), f (b) {}
Foo(Foo const &c) : i (c.i), n (c.n), f (c.f) {}
~Foo() {}

int dolart () { f += i; f /= 3.14; return 0; }
int dofizz (int a) { n = a; i += n; return n; }
};
}

int main()
{
vector < int > test;
vector < Baz::Foo > list;

for (int i=0; i<5; ++i)
{
test.push_back (i);
list.push_back (Baz::Foo (i, 1.1));
}

vector < Baz::Foo >::iterator i = list.begin();
vector < Baz::Foo >::iterator e = list.end();

for (int n=0; i != e; ++i)
i->dolart (), i->dofizz (++n);

return 0;
}

And if we use the following "tapset" on the code

global lart_count
global fizz_count
global push_back_count

probe process("a.out").function("Baz::Foo::do*").return
{ if ($return == 2) printf("Foo returned 2!\n"); }

probe process("a.out").function("Baz::Foo::dolart")
{ ++lart_count; }

probe process("a.out").function("Baz::Foo::dofizz")
{ ++fizz_count; }

probe process("a.out").function("vector < * >::push_back")
{ ++push_back_count; }

probe process("a.out").function("main").return
{
printf("dolart was called %d times.\n", lart_count);
printf("dofizz was called %d times.\n", fizz_count);
printf("vector::push_back was called %d times.\n", push_back_count);
}

And run it as follows:

$ g++ -g test.cpp
$ sudo stap probe.stp -c ./a.out

Then we should get the following result:

Foo returned a 2!
dolart was called 5 times.
dofizz was called 5 times.
vector::push_back was called 10 times.

A couple issues I noticed
  • You must run this as root. This is because stap has to compile and load a kernel module that has enormous power to probe your system.
  • Seems to run a bit slow on the simple example. This could be one-time start-up due to compiling a kernel module, or it could be a real-time overhead of not using the cheaper static probes.
  • It seems to spew a lot of strange mangled C++ names for an unknown reason.
  • Constantly respecifying process("a.out") seems redundant. There's probably a way to get around this overhead.

5 comments:

Ryan McDougall said...

There does appear to be a way around the process("a.out") problem: it can be any path. That is, if you specify process("./") then it will only consider any process running in the current working directory...

Frank Ch. Eigler said...

Hi. FYI, some extra information regarding systemtap.

* User-space probing akin to what you're doing in your scripts has been working for a couple of years, not just F14.

* Systemtap has an unprivileged-mode (non-root) mode, but it's not quite complete yet. The gist of it is that you'd need to run a local 'stap-server' daemon, put yourself into the 'stapusr' group, run a few more setup commands, and then you will be permitted to probe your own processes.

* Constantly specifying a.out is a bit of a hassle, but it may be rationalized if one understands that a single systemtap script can probe many processes / shared-libraries, plus the kernel, at the same time. So there is a need to spell things out.

Dave Brolley said...

Regarding unprivileged mode, and further to fche's comments, unprivileged users (i.e. not root) can use a subset of systemtap's capabilities to perform non 'dangerous' probing such as, in your case, tracing your own application.

First you need to be part of the group stapusr:

sudo usermod -G stapusr

You'll need to log out and log in again in order for this to become effective.

Next, make sure that the systemtap-server package is installed on your host.

Next, you need to start a systemtap compile server:

sudo service stap-server start

You should see some information about the server if it started ok. Next, run your script but use the server to check it and compile it

stap YOUR-SCRIPT --use-server --unprivileged

--use-server tells systemtap to compile the script using your server.

--unprivileged tells the server to check your script to maker sure it doesn't do anything 'dangerous', i.e. nothing than an ordinary user shouldn't be able to do.

If the server approves of your script and is able to compile it, systemtap will then load the resulting module for you even though you are not root.

systemtap has to trust the server used to compile and check your script in order for this to work (trust is established using ssl and cryptographic signing of the module by the server), however since you're on f14, your server should already be trusted on your local host.

If you want to see the status of your server:

service stap-server status

If you want to shut down your server:

sudo service stap-server-stop

If you want to see a list of servers trusted to compile and check your script on your local network (not just on your local host)

stap --list-servers --unprivileged

Dave Brolley said...

I forgot to add that you'll need to specify the full path to your application in the 'process' probes in your script, otherwise the server won't be able to find it when it tries to compile your script.

Mitch said...

Hey,
Have you ever seen DZone.com's Refcardz series? There hasn't been one yet on basic C++ and I think you might be an excellent author for one. Send me an email at mitch[at]dzone>dot<com if this might interest you.