For attaining our goal of studying the cache performance of various algorithms on typical loads, it was important that our loads have reproducible performance and reflect well the behavior of typical programs. At the same time, we wanted them to be large programs, so that they stress the cache system and allow clever replacement policies to make their effects felt. For this purpose, we used a suite of programs selected from the SPEC 95 integer benchmarks, the Mediabench[7] benchmarks of multimedia applications, and some programs implementing various algorithms in C++ and Java[8]. The benchmark programs are described in a subsequent section.
We decided to use the ATOM instrumentation tool available on the Alpha machines[6]. This allowed us to instrument any executable code with our instrumentation routines. In particular, we catch the memory accesses (loads and stores), and call our routines which simulate a data cache.
The different cache algorithms were implemented to be as independent of each other as possible. They maintain no global state. Any state maintained is done by static variables within the functions. This allowed us to compile and use the functions in a modularised way. We used shell scripting extensively to automate our testing and result generation processes.