Home | Blog | Experience | Publications |
![]() |
Ben is largely interested in systems programming, performance analysis, and
high-performance computing. He currently works at Intel Corporation with
Brendan Gregg on
innovative performance profiling tools.
|
An advanced AI / GPU profiler for Intel GPUs, open-source. It uses eBPF, kernel driver interfaces, and new hardware sampling features to present flamegraphs and flamescopes of kernels running on the GPU. 2024-2025.
Made contributions to Brandon Kammerdiener's profile visualizer, Proviz. It renders visualizations in the CLI, and supports parsing iaprof and perf output. 2025.
Made various little contributions to TensorFlow while doing customer support for large cloud customers. 2023.
Got fed up with the work necessary to spin up VMs, and wrote scripts to do it a little faster (using cloud-init). Nothing fancy. 2023.
A per-process instruction mix profiler, and open-source. It uses perf events and minimal eBPF to sample instructions being executed on the system, disassembles them, and present them to the user in an easy-to-understand format. 2022.
Rescued an Okidata Microline 391 Turbo dot matrix printer and use it to print and read papers and articles.
My dissertation was on using cross-stack (hardware, kernel, userspace runtimes, applications) strategies to optimize placement of data in heterogeneous memory systems. 2021.
Wrote the high-level interface for the Simplified Interface to Complex Memory (SICM) in cahoots with Dr. Michael Jantz (my advisor), Terry Jones (at ORNL), and Kshitij Doshi (at Intel). This interface allowed you to run unmodified (but recompiled with a special LLVM compiler pass) applications on heterogeneous memory systems, and selected which "memory tier" each allocation site would be backed by. 2019-2021.
My advisor began my graduate program by having me port a patch which implemented memory coloring in the Hotspot Java virtual machine. You be the judge: can this be considered "hazing?" The corresponding kernel mm patch is omitted for your sanity, but could "compress" or "expand" application memory across DRAM ranks to either save power or improve bandwidth. 2016.
Built a ridiculous (-ly fun?) build system in Perl for a cluster at the University of Tennessee, Knoxville. 2015.