Next: Benchmark descriptions Up: CS-740 Project Report Exploring Previous: Debugging innovations

Different Schemes

The different schemes that we implemented fall under two categories. First, there are policies that are simple and widely known like random, LFU (Least Frequently Used), LRU (Least Recently Used) and FIFO (First In First Out). We implemented these to have something against which we can benchmark our implementations of the policies in the second category. In the second category are not too widely used schemes, that we thought could potentially achieve good results. The various schemes in this category are:

Hotornot: This policy tries to take advantage of the case in which there is a single hot element, i.e. an element that is accessed quite often, in an otherwise cold line. In such cases, it is profitable to move this single element to a separate hot set and not waste valuable cache space by storing the whole line. As an example, this case occurs when a program repeatedly accesses a set of array elements in strides greater than the cache block size.
In our design, this hot set is 0.8% ( ${\frac{1}{128}}$ ) of the total cache size, and thus incurs little space overhead. We implement this scheme as a variant of LRU. The way we keep track of the hot element is that corresponding to every cache line, we record along with its last accessed information, the actual element that was accessed from the line. We perform a LRU replacement in the hot set. So, when a new address is encountered, we first search for it in the hot set and if found, we update the last accessed information. If not, we find the candidate which ought to be replaced from the hot set. Then, we search for the line containing this element in the main cache. If not found, we bring in this line by evicting a line from the cache using LRU. While doing this, we also compare the last accessed information of the hot element of the line being evicted with the corresponding information of the element that should be evicted from the hot set as per LRU. We might want to knock out an element out of the hot set to accommodate this element.
We have presently made the hot set fully associative. With increasing cache sizes, this will increase the cost of searching in the hot set substantially, we need to either limit the associativity of the hot set, or put a limit on its size. We have not yet investigated this aspect of the scheme.
Firstnot: This scheme tries to exploit the slight aberration in the data access patterns of the programs. To take an example, a program tries to access some specific address, that is not in its active set, and which it does not plan to access in the future. This may occur as a result of having to check for some error condition.
We have based this scheme on LRU and LFU. Thus, corresponding to every cache line, we store both its last accessed and count information. Whenever we need to evict a line (this is decided based on LRU), we look at the count information of the cache line. If the count is more than 1, we do not make the replacement and set the count to 1, otherwise, we do the replacement.
Victim: In this scheme, we have a victim cache that is 0.2% ( ${\frac{1}{512}}$ ) of the total cache size, is fully associative, and follows FIFO policy both in the replacements from the main as well as the victim cache.
DiffBlock: In this scheme, we explore whether partitioning the cache based on whether it is a stack, heap or global data has any effect on the cache miss rates.
LFU_over_LRU: In this scheme, we have tried to incorporate both the global history (as can be found in LFU), with the local history (as found in LRU). For example, the simple LRU policy simply evicts the line which was hot in the past, but has not been accessed during the window of the LRU. We attempt to rectify this in this scheme.
In this scheme, every set is composed of a queue of cache lines, and among queues, LFU is followed, but inside a queue, LRU policy is followed. Cache lines are only evicted from the least priority queue. By varying the number of queues, the scheme allows us to explore the full spectrum of schemes in between LRU and LFU. As we have implemented it, the scheme has too much computational overhead, but still, it is an attempt to combine the advantages of both LRU and LFU. This also aided us in the debugging process, because by having varying number of queues, we could mimic both LRU and LFU.
LRU_over_LFU: This is just the reverse of the above scheme. LRU is followed among queues, and LFU is followed inside a queue.
LRU_dirty_secondchance: This was our attempt to explore the trade-off between the number of cache misses and the number of dirty blocks evicted and forced to be written to memory. Here, we give an extra chance to a dirty line by not evicting it the first time it is encountered.
Prefetch: In this version, we augmented LRU in a simple way. For example, LRU performs miserably for any program that accesses in a loop a data set larger than what could possibly fit in the cache. In fact, for this case, MRU (Most Recently Used) is optimal. We could avoid such misses, if we could know that the line we are evicting now, contains the element that would be accessed the soonest.
To implement this, we keep track of the past accesses, use this knowledge to speculate what is going to be accessed next, and if the data is being evicted, prevent it from being evicted. A more general form of this would be to pro-actively go and prefetch the data so that it is actually present in the cache when needed, but, we cannot do this in our constrained framework.
Currently, we detect only strides, based on the last 2 accesses the cache system has seen.
Side: This scheme tries to mimic LRU with a much less overhead. In this scheme, a single counter tries to keep track of which elements in the set have been accessed recently (), and which are fit for replacement (). On every hit to a cache line, the line becomes part of . Essentially, this means that the counter moves to include the element, only if the element did not originally belong to . In case a cache line has to be replaced, the line pointed to by the counter is evicted and the counter is subsequently increased by 1. The schemes thus loses all history information whenever the last cache line of a set registers a hit.

Next: Benchmark descriptions Up: CS-740 Project Report Exploring Previous: Debugging innovations

Amit K Manjhi 2001-12-04