Implementation of a multi GPU version of the cuInspiral pipeline
Moore’s Law is a violation of Murphy’s Law. Everything gets better and better this is how Gordon Moore commented the law, that bears his name, in 2005. Gordon E. Moore formulated the law by a simple observation. In 1965 he noted that number of components in integrated circuits had doubled every two years from the invention of the integrated circuit in 1958. Thus, he predicted that the trend would continue for at least ten years. Some years after the law was reformulated by taking into account an higher growth, the final formulation stated that integrated circuits would double in performance every 18 months.
Along last 20 years hardware manufacturers introduced several architecture innovation to maintain the trend dictated by Moore’s law, that is now used in the semiconductor industry to guide long-term planning and to set targets for research and development. Architectural innovations have gone in the direction of introducing implicit and explicit parallelization concepts, and this has been used as a way to go around the obvious miniaturization limitations and frequency increment.
Starting from 2005 multi-core CPU have been introduced in the everyday computing architecture, both in the embedded and standard systems. This solution implements multiprocessing in a single physical package, namely the full processor, replicating the whole computing core. The actual multi-core CPU implements up to four/six cores per package. In case of the multi-core CPU the performance gain is strictly related to the quality of the parallelized software. Referring to this concept we have to quote Amdahl’s law, that connect the parallelization gain with the fraction of the software that can be parallelized in order to run on multiple cores simultaneously. The next obvious step in this direction is the many-core architecture. Thus, computing units where several tens of cores are connected together. The actual state of art in many-core architecture is represented by GPU processors, where hundreds of computing cores are implemented within a single package.