Yan LiuFollow

Date of Award

Winter 12-15-2017

Level of Access

Campus-Only Thesis

Degree Name

Master of Science (MS)


Computer Engineering


Vincent Weaver

Second Committee Member

Bruce Segee

Third Committee Member

Yifeng Zhu


Performance analysis is an essential step for better software optimization, which is critical for embedded systems, desktop applications and scientific computing. Most modern microprocessors contain hardware performance counters that can help with performance analysis. The PAPI library is a widely-used self-monitoring performance measurement interface that supports the performance counter hardware found in most major microprocessors. PAPI supports self-monitoring: letting programs instrument chunks of code and gather detailed performance values.

A key aspect of self-monitoring is reading hardware performance counters with minimum possible overhead. Any overhead in the measurements can affect the accuracy of the results. In perf_event, the Linux interface to performance counters, the values are read via the read system call. This involves a large overhead when entering and exiting the operating system kernel.

In this work, we modify PAPI to use the rdpmc instruction which allows userspace measurement of counters on x86 systems. This replaces the use of the high-overhead read () system call. We tested the result across 14 modern systems and 4 benchmarks. We find that the performance measurement latency is improved by at least a factor of three (and often a factor of six or more) in our test cases.