Experimental Results

**Figure:** Cache Misses for different policies. Key to benchmarks: 1 epic , 2 cellauto.c.1000-100 , 3 matmult.c.64 , 4 qsort.c.50000 , 5 queens.c.10 , 6 wordfreq.c.rev, 7 cellauto.java.1000-100 , 8 javacc.java.javaparser , 9 javac.java.spreadsheet , 10 kawa.java.pi100 , 11 linpack.java , 12 matmult.java.64 , 13 qsort.java.50000 , 14 queens.java.10 , 15 wordfreq.java.rev

As expected, the Random policy performs pretty badly, while the LRU and LFU policies work well. The LRU policy usually is better than LFU. Interestingly, the SIDE algorithm does pretty well, with performance close to LFU, though it maintains a lot less overhead. The hybrid schemes also do well, being usually between LRU and LFU in performance. This suggests an adaptive mix of the two schemes will do well. Also, having a victim cache improves performance tremendously, even though replacement policies used there is FIFO, which otherwise doesn't perform as well. Thus we validate that having a victim cache can improve performance. Among other schemes, the hotornot policy stands out. Our simple prefetch scheme is not able to detect many patterns and so, in most cases, its performance closely follows the LRU performance. Although we do not present the results for diffblock, since we could not get all the benchmarks to run on Alpha due to lack of time, we noticed that it does not perform as good as LRU. This suggests that a static division of cache among the heap, stack and library address spaces will not work, and we need to look at dynamic methods of dividing the available cache space or have a profile-based division corresponding to each large program.

Next, we wanted to find out how our policies perform with changes in cache parameters. We decided to vary associativity and cache size independently for this purpose. All these experiments were performed on a single benchmark, the Java compiler javac compiling a spreadsheet application. The result of varying the cache size from 8 KB to 128 KB are shown in figure

. This represents the approximate sizes of cache's available today. In fact,today's caches go up to 512 KByte and larger, but we limited ourselves to 128 KByte because we ran into memory problems with larger cache sizes. All the other parameters were the same as the previous experiment. As expected, the cache miss rate drops with increasing cache size. In this experiment, hotornot emerges a clear winner and the performance difference is definitely perceptible for small cache sizes. As the cache size increases beyond the active data set of the program, as expected, only cold misses occur, and all policies converge to a common value. This also causes the performance of the ``bad'' policies to fall more sharply than the others as the cache size is increased. Finally, there is no data point for the victim cache scheme for 8 KB size because the victim cache size is 16 bytes, which is not sufficient to store even a single line.

For figure

, we took a 16 KByte cache with line size 32, and studied the effects of varying associativity from one (direct-mapped cache) to a 16 way set-associative, which we believe is the limit of feasibility. Larger values of associativity would increase hit times too much. In fact, few modern processors even have 8 way associative caches. As expected (except for some pathological cases, one of which is detailed above), the cache miss rate drops with increasing associativity. One exception was the side scheme, for which a lower associativity works. This seems to suggest that for some data access patterns, the simple partitioning scheme of side does not work well when there are a large number of elements in the set. Our cache size is only 16 KB for this experiment. The reason we took a smaller cache is because when we had initially tried the experiment for a larger cache, the fall in the cache miss rate with increasing associativity was barely noticeable. Thus a large cache can get away with lower associativity, which is embodied by the so-called

cache rule of thumb. A

sized cache with associativity 1 performs almost equally as a

sized cache with associativity 2. Thus low associativity is not a bad thing if our cache is large, or to put it another way, if our programs are not absolutely huge. We also see that the LRU_ over_LFU policy performs relatively better as the associativity increases. This is due to the fact that in general, LRU performs better than LFU, and as associativity increases, the scheme can take more advantage of the LRU scheme between the sub-queues.

Lastly, as we have stressed before, the increase in hit time that a policy causes should not be neglected. We had no way of estimating this cost in our current setup. However, we thought we needed to address this issue, and as a first cut, we studied how many dirty blocks were replaced by a particular scheme. If a modified or ``dirty'' block is replaced by the strategy, the cache miss penalty is likely to be higher as the block has to be written out to memory.

To explore this trade-off, we implemented a simple second chance algorithm to control dirty replacements. We took a 4-way set associative cache which had a size of 32 KByte, and measured how many dirty lines were evicted. The results are shown in Table

. The dirty miss rate represents the percentage of evicted cache lines that were dirty.

Table: Cache Misses and Dirty Cache Replacements.

Benchmarks	LRU		LRU with $2^{nd}$ chance
	Dirty Miss Rate	Cache Miss Rate	Dirty Miss Rate	Cache Miss Rate
cellauto.c	0.000000	0.007131	0.000000	0.007131
cellauto.java	65.038391	1.230620	65.021670	1.230343
epic	43.031039	1.327477	43.009294	1.327847
go.alpha	72.217353	4.755708	72.109307	4.762191
javacc.java.javaparser	77.174208	0.191514	76.342888	0.191739
javac.java.spreadsheet	24.949563	1.210390	24.865591	1.211204
kawa.java.pi100	55.758559	0.643140	55.702783	0.643678
linpack.c	86.575478	4.712255	86.562911	4.712869
linpack.java	86.308468	1.179416	86.201878	1.180649
matmult.c	9.936515	1.062921	9.932818	1.063317
matmult.java	16.543291	0.675483	16.527272	0.676071
qsort.c	71.599780	1.600386	71.585903	1.600334
qsort.java	71.653068	0.217880	71.600355	0.218012
queens.c	0.000000	0.002678	0.000000	0.002678
queens.java	69.076642	0.321980	69.076642	0.321980
wordfreq.c.rev	40.539468	0.066736	39.968342	0.066894
wordfreq.java.rev	44.442785	0.564729	44.137602	0.565691

What we would normally expect is that as we limit the dirty store (such as in our second chance algorithm), the cache misses suffered by the program would rise. This is the case for most programs. But, for some programs, for example, qsort.c and cellauto.java, both numbers fall, which is an extremely good thing. This suggests that this trade-off must be further explored on other benchmarks. Moreover, the gains from the second chance algorithm in both the favorable cases is barely perceptible. This suggests that more aggressive algorithms that aggressively constrain the write back of dirty lines should be considered.