ACCELERATING GNSS SOFTWARE

This paper addresses both the efficiency and the portability of a computer program in charge of the baseband signal processing of a GNSS receiver. Efficiency, in this context, refers to optimizing the speed and memory requirements of the software receiver. Specifically, the interest is focused on how fast the software receiver can process the incoming stream of raw signal samples and, in particular, if signal processing up to the position fix can be executed in real-time (and how many channels the host computer executing the receiver application can sustain in parallel). This is achieved by applying the concept of parallelization at different abstraction levels. The paper describes strategies based on task, data and instructionlevel parallelism, as well as actual implementations released under an open source license and the results obtained with different commercially available computing platforms. At the same time, the proposed solution also addresses portability, understood as the usability of the same software in different computing environments.


Motivation
"The Rise of GNSS": • GNSS scenario with + satellites, different GNSS open signal waveforms for civilian usage, belonging to different systems and broadcast at different frequency bands.
• New modulations require more bandwidth and more processing complexity on the receiver.• The natural target is a multi-constellation, multi-band GNSS receiver operating in real-time.
Computing goes parallel: • A typical desktop in : Pentium IV processor at .GHz, dual-core technology, with memory bandwidths ∼ GB/s.• In , Intel is planning to release their Broadwell-E processor series, with a clock speed up to .GHz. Can house up to cores, with memory bandwidths ∼ GB/s.

Computing Ecosystem
In , desktop computers are not the dominant form factor anymore.
• Laptops, gaming consoles, mini PCs, tablets and smartphones has pushed into the market other sort of processors with low power consumption figures and specific features for multimedia content handling.• Cloud computing paradigm.

This is a software design challenge
We need to address both efficiency and portability at the same time.

Multi-threading
To make this potential performance gain effective, the software running on the platform must be written in such a way that it can spread its workload across multiple execution cores.

Data parallelism
Instructions that can be applied to multiple data elements in parallel.This computer architecture is known as Single Instruction Multiple Data (SIMD).
A Kahn process describes a model of computation where processes are connected by communication channels to form a network.Processes produce data elements or tokens and send them along a communication channel where they are consumed by the waiting destination process.
Communication channels are the only method processes may use to exchange information.
A very simple flow graph.
/ Kahn' s model of process networks Systems that obey Kahn's mathematical model are determinist: the history of tokens produced on the communication channels does not depend on the execution order.
With a proper scheduling policy, it is possible to implement software defined radio process networks holding two key properties: Non-termination: understood as an infinite running flow graph process without deadlocks situations, and Strictly bounded: the number of data elements buffered on the communication channels remains bounded for all possible execution orders. /

Baseband processing
At this level of abstraction, parallelization of operations to the incoming signal is performed at a targeted satellite signal basis.
Input signal: At the matched filter output: Typical receiver architecture In SIMD technologies, the same set of instructions is executed in parallel to different sets of data.
This reduces the amount of hardware control logic needed by N times for the same amount of calculations, where N is the width of the SIMD unit.
Designed for the acceleration of multimedia processing, can be used for GNSS signal processing.
/ SIMD in software-defined GNSS receivers

Unsigned integer, bits
ranging from to 2 64 − 1.A program runs all the implementations that can be executed in your computer, annotating which is the fastest and then selecting it at runtime when the function is called.
parallel computers can work on several tasks at once: by parceling them out to the different processors, by executing multiple instruction streams in an interleaved way in a single processor (multithreading), or by a combination of both strategies.
code available at https://github.com/gnss-sdr/gnss-sdr/ GNU Radio signal processing block Kahn's mathematical model is implemented by GNU Radio, an open source framework for software-defined radios., "GNU Radio runtime operation," in Proc.GNU Radio Conference, Washington, DC, Aug. -./ GNSS-SDR features a thread-per-block implementation, which scales well with the number of processors.a GNSS receiver have been implemented in an open source library: VOLK_GNSSSDR.VOLK_GNSSSDR provides several implementations for each function: • A generic, plain C implementation, • other implementations making use of different SIMD technologies (SSE , SSE ., AVX, NEON, ...).

/
Server: a Dell's PowerEdge R server housing a CPU with two Intel Xeon E -v at 2.4 GHz (8 cores, threads each) and an NVIDIA Tesla K GPU with x CUDA cores clocked at 745 MHz.The operating system during tests was GNU/Linux Ubuntu ., 64 bits, using GCC . . ./ Processing platforms (II/IV) Platform # -Laptop: Apple's MacBook Pro Late , with an Intel Mobile Core i -U (quad-core) CPU at 2.4 GHz (active cores can be speeded up to 3.8 GHz), and Hyper Threading technology allows the system to recognize eight total "cores" or "threads" (four real and four virtual), plus an NVIDIA GeForce GT M GPU with 384 CUDA cores clocked at 967 MHz.The operating system during tests was Mac OS X ., using Apple LLVM / Clang version . . ./ Processing platforms (III/IV) Platform # -Embedded development kit: NVIDIA's Jetson TK developer kit, equipped with a quad-core ARM Cortex-A CPU at 2.32 GHz and an NVIDIA Kepler GPU with 192 CUDA cores clocked at 950 MHz.The operating system during tests was GNU/Linux Ubuntu ., 32 bits, using GCC . . ./ Processing platforms (IV/IV) Platform # -Mini-computer: Raspberry Pi Model B, equipped with a Broadcom BCM CPU (64 bit, ARMv quad-core ARM Cortex A ) clocked at .GHz.The operating system used during tests was Raspbian GNU/Linux (jessie), 32 bits, using GCC . ..Processing results of GPU offloading-correlator GPS L C/A channels with different sampling rates.

/
We described several parallelization techniques addressing computational efficiency at different abstraction layers.All those concepts were applied into a practical implementation available online under a free and open source software license.Building upon well-established open source frameworks and libraries, we showed that it is possible to achieve real-time operation in different computing environments./ Conclusions (II/II) Portability was demonstrated by building and executing the same source code in a wide range of computing platforms, from high-end servers to tiny and affordable computers, using different operating systems and compilers, and showing notable acceleration factors of key operations in all of them.As a practical outcome of the presented work, this paper introduced, to the best of authors' knowledge, the first free and open source software-defined GNSS receiver able to sustain real-time processing and to provide position fixes in ARM-based devices.More info available onlineThank you for your attention!Find out more at: Source code: https://github.com/gnss-sdr/gnss-sdrWebpage: http://gnss-sdr.org/ Typical functions in a software-defined GNSS receiver: ], [S 1 ], [S 2 ], ...

Table :
Maximum number of real-time parallel channels for each platform using GPU accelerators.