Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published March 2, 2016 | Version 10004328
Journal article Open

Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Description

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Files

10004328.pdf

Files (475.3 kB)

Name Size Download all
md5:4fe05f5afaf279c6e8cbdbecb73417f5
475.3 kB Preview Download

Additional details

References

  • G. E. Blelloch and B. M. Maggs, "Parallel algorithms", Algorithms and theory of computation handbook, Chapman & Hall/CRC, pp. 25–25, 2010.
  • E. Horowitz and A. Zorat, "Divide-and-conquer for parallel processing", IEEE Transactions on Computer, vol. C-32, no. 6, pp. 582–585, 1983.
  • L. K. Melhus, "Analyzing contextual bias of program execution on modern CPUs", M.S. Thesis, Norwegian University of Science and Technology, 2013
  • A. Hemsath, R. Morton and J. Sjodin, "Implementing a stack cache", Rice University, Advanced Microprocessor Architecture, http://www.owlnet.rice.edu/~elec525/projects/SCreport.pdf (2002).
  • L. E. Olson, Y. Eckert, S. Manne and M. D. Hill, "Revisiting stack caches for energy efficiency", University of Wisconsin, Technical Report. TR1813, 2014
  • D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0", University of Wisconsin-Madison Computer Sciences Department, Technical Report. TR1342, 1997
  • P. Kataria, "Parallel quicksort implementation using MPI and pthreads", http://www.winlab.rutgers.edu/~pkataria/pdc.pdf (2008).
  • M. A. Ertl, "Implementation of stack-based languages on register machines", Ph.D. Thesis, Technische Universit\"at Wien, Austria, 1996
  • M. F. Ionescu and K. E. Schauser. "Optimizing parallel bitonic sort", Parallel Processing Symposium, 1997. Proceedings., 11th International. IEEE, pp. 303-309, 1997. [10] R. Romansky and Y. Lazarov, "Stack cache memory", Journal of Information, Control and Management Systems, vol. 1, no. 2, pp. 29–37, 2003.