Chapter 19. Profile Mode

Table of Contents

Intro
Using the Profile Mode
Tuning the Profile Mode
Design
Wrapper Model
Instrumentation
Run Time Behavior
Analysis and Diagnostics
Cost Model
Reports
Testing
Extensions for Custom Containers
Empirical Cost Model
Implementation Issues
Stack Traces
Symbolization of Instruction Addresses
Concurrency
Using the Standard Library in the Instrumentation Implementation
Malloc Hooks
Construction and Destruction of Global Objects
Developer Information
Big Picture
How To Add A Diagnostic
Diagnostics
Diagnostic Template
Containers
Hashtable Too Small
Hashtable Too Large
Inefficient Hash
Vector Too Small
Vector Too Large
Vector to Hashtable
Hashtable to Vector
Vector to List
List to Vector
List to Forward List (Slist)
Ordered to Unordered Associative Container
Algorithms
Sort Algorithm Performance
Data Locality
Need Software Prefetch
Linked Structure Locality
Multithreaded Data Access
Data Dependence Violations at Container Level
False Sharing
Statistics
Bibliography

Intro

Goal: Give performance improvement advice based on recognition of suboptimal usage patterns of the standard library.

Method: Wrap the standard library code. Insert calls to an instrumentation library to record the internal state of various components at interesting entry/exit points to/from the standard library. Process trace, recognize suboptimal patterns, give advice. For details, see paper presented at CGO 2009.

Strengths:

  • Unintrusive solution. The application code does not require any modification.

  • The advice is call context sensitive, thus capable of identifying precisely interesting dynamic performance behavior.

  • The overhead model is pay-per-view. When you turn off a diagnostic class at compile time, its overhead disappears.

Drawbacks:

  • You must recompile the application code with custom options.

  • You must run the application on representative input. The advice is input dependent.

  • The execution time will increase, in some cases by factors.

Using the Profile Mode

This is the anticipated common workflow for program foo.cc:

$ cat foo.cc
#include <vector>
int main() {
  vector<int> v;
  for (int k = 0; k < 1024; ++k) v.insert(v.begin(), k);
}

$ g++ -D_GLIBCXX_PROFILE foo.cc
$ ./a.out
$ cat libstdcxx-profile.txt
vector-to-list: improvement = 5: call stack = 0x804842c ...
    : advice = change std::vector to std::list
vector-size: improvement = 3: call stack = 0x804842c ...
    : advice = change initial container size from 0 to 1024

Anatomy of a warning:

  • Warning id. This is a short descriptive string for the class that this warning belongs to. E.g., "vector-to-list".

  • Estimated improvement. This is an approximation of the benefit expected from implementing the change suggested by the warning. It is given on a log10 scale. Negative values mean that the alternative would actually do worse than the current choice. In the example above, 5 comes from the fact that the overhead of inserting at the beginning of a vector vs. a list is around 1024 * 1024 / 2, which is around 10e5. The improvement from setting the initial size to 1024 is in the range of 10e3, since the overhead of dynamic resizing is linear in this case.

  • Call stack. Currently, the addresses are printed without symbol name or code location attribution. Users are expected to postprocess the output using, for instance, addr2line.

  • The warning message. For some warnings, this is static text, e.g., "change vector to list". For other warnings, such as the one above, the message contains numeric advice, e.g., the suggested initial size of the vector.

Three files are generated. libstdcxx-profile.txt contains human readable advice. libstdcxx-profile.raw contains implementation specific data about each diagnostic. Their format is not documented. They are sufficient to generate all the advice given in libstdcxx-profile.txt. The advantage of keeping this raw format is that traces from multiple executions can be aggregated simply by concatenating the raw traces. We intend to offer an external utility program that can issue advice from a trace. libstdcxx-profile.conf.out lists the actual diagnostic parameters used. To alter parameters, edit this file and rename it to libstdcxx-profile.conf.

Advice is given regardless whether the transformation is valid. For instance, we advise changing a map to an unordered_map even if the application semantics require that data be ordered. We believe such warnings can help users understand the performance behavior of their application better, which can lead to changes at a higher abstraction level.

Tuning the Profile Mode

Compile time switches and environment variables (see also file profiler.h). Unless specified otherwise, they can be set at compile time using -D_<name> or by setting variable <name> in the environment where the program is run, before starting execution.

  • _GLIBCXX_PROFILE_NO_<diagnostic>: disable specific diagnostics. See section Diagnostics for possible values. (Environment variables not supported.)

  • _GLIBCXX_PROFILE_TRACE_PATH_ROOT: set an alternative root path for the output files.

  • _GLIBCXX_PROFILE_MAX_WARN_COUNT: set it to the maximum number of warnings desired. The default value is 10.

  • _GLIBCXX_PROFILE_MAX_STACK_DEPTH: if set to 0, the advice will be collected and reported for the program as a whole, and not for each call context. This could also be used in continuous regression tests, where you just need to know whether there is a regression or not. The default value is 32.

  • _GLIBCXX_PROFILE_MEM_PER_DIAGNOSTIC: set a limit on how much memory to use for the accounting tables for each diagnostic type. When this limit is reached, new events are ignored until the memory usage decreases under the limit. Generally, this means that newly created containers will not be instrumented until some live containers are deleted. The default is 128 MB.

  • _GLIBCXX_PROFILE_NO_THREADS: Make the library not use threads. If thread local storage (TLS) is not available, you will get a preprocessor error asking you to set -D_GLIBCXX_PROFILE_NO_THREADS if your program is single-threaded. Multithreaded execution without TLS is not supported. (Environment variable not supported.)

  • _GLIBCXX_HAVE_EXECINFO_H: This name should be defined automatically at library configuration time. If your library was configured without execinfo.h, but you have it in your include path, you can define it explicitly. Without it, advice is collected for the program as a whole, and not for each call context. (Environment variable not supported.)

Bibliography

Perflint: A Context Sensitive Performance Advisor for C++ Programs . Lixia Liu. Silvius Rus. Copyright © 2009 . Proceedings of the 2009 International Symposium on Code Generation and Optimization .