Parallel Vector Processing Using Multi Level Orbital DATA
Creators
Description
Many applications use vector operations by applying
single instruction to multiple data that map to different locations
in conventional memory. Transferring data from memory is limited
by access latency and bandwidth affecting the performance gain of
vector processing. We present a memory system that makes all of
its content available to processors in time so that processors need
not to access the memory, we force each location to be available to
all processors at a specific time. The data move in different orbits
to become available to other processors in higher orbits at different
time. We use this memory to apply parallel vector operations to data
streams at first orbit level. Data processed in the first level move
to upper orbit one data element at a time, allowing a processor in
that orbit to apply another vector operation to deal with serial code
limitations inherited in all parallel applications and interleaved it with
lower level vector operations.
Files
10006477.pdf
Files
(128.2 kB)
Name | Size | Download all |
---|---|---|
md5:5ef4eddc0b261b5c81eb7e575459fda9
|
128.2 kB | Preview Download |
Additional details
References
- J. Hennessy, D. A. Patterson Computer Architecture: A Quantitative Approach Morgan Kaufmann Publishers, Inc, San Francisco, CA, 1996.
- Agarwal, B. H. Lim, D. Kranz and J. Kubiatowicz, April: A processor architecture for Multiprocessing, in Proceedings of the 17th Annual International Symposium on Computer Architectures, pages 104-114, May 1990.
- D. Burger, J. R. Goodman, and A. Kagi, Memory Bandwidth of Future Microprocessors, In Proc. 23rd Annual Int. Symp. on Computer Architecture, (ISCA'96), pp.78-89, Philadelphia, PA, 1996.
- Saulsbury, A.; Nowatzyk, A. Missing the memory wall: the case for processor memory integration, ISCA96: The 23rd Annual International Conference on Computer Architecture, Philadelphia, PA, USA, 22-24 May 1996 p.90-101.
- G. Hinton, D. Sager, M. Upton, D. Boggs, D. Camean, A. Kyker, and P. Roussel, The microarchitecture of the Pentium 4 processor, Intel Technology Journal, 5(1), pages 1-133, Feb. 2001.
- Eichenberger et al., International Business Machines Corporation, Armonk, NY (US) Vector LoadsWith Multiple Vector Elements From a Same Cache Line in a Scattered Load Operation, US 8,904,153 B2 Dec. 2, 2014.
- Mekhiel, Data processing with time-based memory access, US 8914612B2 Dec 16, 2014.
- Introducing TAM: "Time Based Access Memory", Nagi Mekhiel, IEEE Access journal, March 30, 2016. P. 1061-1073 Volume 4.