Published December 28, 2009 | Version 11678
Journal article Open

Analysis of Long-Term File System Activities on Cluster Systems

Description

I/O workload is a critical and important factor to analyze I/O pattern and to maximize file system performance. However to measure I/O workload on running distributed parallel file system is non-trivial due to collection overhead and large volume of data. In this paper, we measured and analyzed file system activities on two large-scale cluster systems which had TFlops level high performance computation resources. By comparing file system activities of 2009 with those of 2006, we analyzed the change of I/O workloads by the development of system performance and high-speed network technology.

Files

11678.pdf

Files (166.4 kB)

Name Size Download all
md5:8e67b7a74f4daa2e22bae299bf1b2442
166.4 kB Preview Download

Additional details

References

  • John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze, Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review archive, Volume 19, Issue 5, pp. 15~24, 1985.
  • PVFS web size, http://www.pvfs.org
  • Lustre web site, http://wiki.lustre.org
  • GPFS Wikipedia, http://en.wikipedia.org/wiki/GPFS
  • Hyeyoung Cho, Sungho Kim and SangDong Lee, "Design and Implementation of Shared Memory based Parallel File System Logging Method for High Performance Computing," Volume 45, 2008.
  • Hyeyoung Cho, Kwangho Cha and Sungho Kim, "Analysis of File System Workloads on Hamel Cluster System," 2006 Autumn Conference, Korea Information Processing Society, 2006.
  • M. Satyanarayanan, "A Study of File Sizes and Functional Lifetimes," In Proceedings of the 8th Symposium on Operating Systems Principles, pp. 96-108, 1981.
  • John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze, Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review archive, Volume 19, Issue 5, pp. 15~24, 1985.
  • Timothy J. Gibson and Ethan L. Miller, "Long-Term File Activity Patterns in a UNIX Workstation Environment," in the Proceedings of the 15th IEEE Symposium on Mass Storage Systems, pp. 355-272, 1998. [10] Allen B. Downey, "The structural cause of file size distributions," ACM SIGMETRICS Performance Evaluation Review, Volume 29, pp. 328 - 329, 2001. [11] Drew Roselli, Jacob R. Lorch,, "A comparison of file system workloads," USNIX, 2002. [12] Nils Nieuwejaar , David Kotz , Apratim Purakayastha , Carla Schlatter Ellis , Michael L. Best, "File-Access Characteristics of Parallel Scientific Workloads," IEEE Transactions on Parallel and Distributed Systems, v.7 n.10, pp.1075-1089, October 1996. [13] Phyllis E. CrandallRuth A. AydtAndrew A. ChienDaniel A. Reed, "Input/Output characteristics of scalable parallel applications," in the Proceedings of the ACM/IEEE Supercomputing conference, 1995. [14] Evgenia Smirni and Daniel A. Reed, "Workload characterization of input/output intensive parallel applications," In the Proceedings of the Conference on Computer Performance Evaluation Modeling Techniques and Tools for computer performance evaluation, Volume 1245, LNCS, pp 169-180, June 1997. [15] Top500 Supercomputing Website, http://www.top500.org