Chatterjee, Krishnendu
Goharshady, Amir Kafshdar
Okati, Nastaran
Pavlogiannis, Andreas
2018-11-04
<p>There is a huge gap between the speeds of modern caches and main memories, and therefore cache misses<br>
account for a considerable loss of efficiency in programs. The predominant technique to address this issue<br>
has been Data Packing: data elements that are frequently accessed within time proximity are packed into the<br>
same cache block, thereby minimizing accesses to the main memory. We consider the algorithmic problem of<br>
Data Packing on a two-level memory system. Given a reference sequence R of accesses to data elements, the<br>
task is to partition the elements into cache blocks such that the number of cache misses on R is minimized.<br>
The problem is notoriously difficult: it is NP-hard even when the cache has size 1, and is hard to approximate<br>
for any cache size larger than 4. Therefore, all existing techniques for Data Packing are based on heuristics<br>
and lack theoretical guarantees.<br>
In this work, we present the first positive theoretical results for Data Packing, along with new and stronger<br>
negative results. We consider the problem under the lens of the underlying access hypergraphs, which are<br>
hypergraphs of affinities between the data elements, where the order of an access hypergraph corresponds to<br>
the size of the affinity group. We study the problem parameterized by the treewidth of access hypergraphs,<br>
which is a standard notion in graph theory to measure the closeness of a graph to a tree. Our main results<br>
are as follows: we show there is a number q∗ depending on the cache parameters such that (a) if the access<br>
hypergraph of order q∗ has constant treewidth, then there is a linear-time algorithm for Data Packing; (b) the<br>
Data Packing problem remains NP-hard even if the access hypergraph of order q∗ − 1 has constant treewidth.<br>
Thus, we establish a fine-grained dichotomy depending on a single parameter, namely, the highest order<br>
among access hypegraphs that have constant treewidth; and establish the optimal value q∗ of this parameter.<br>
Finally, we present an experimental evaluation of a prototype implementation of our algorithm. Our results<br>
demonstrate that, in practice, access hypergraphs of many commonly-used algorithms have small treewidth.<br>
We compare our approach with several state-of-the-art heuristic-based algorithms and show that our algorithm<br>
leads to significantly fewer cache-misses.</p>
https://doi.org/10.1145/3290366
oai:zenodo.org:1477607
eng
Zenodo
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
POPL, ACM Symposium on Principles of Programming Languages, Lisbon, Portugal, 13-19 January 2019
Efficient Parameterized Algorithms for Data Packing
info:eu-repo/semantics/conferencePaper