Dataset: LoED: The LoRaWAN at the Edge Dataset

This paper presents the LoRaWAN at the Edge Dataset (LoED), an open LoRaWAN packet dataset collected at gateways. Real-world LoRaWAN datasets are important for repeatable sensor-network and communications research and evaluation as, if carefully collected, they provide realistic working assumptions. LoED data is collected from nine gateways over a four month period in a dense urban environment. The dataset contains packet header information and all physical layer properties reported by gateways such as the CRC, RSSI, SNR and spreading factor. Files are provided to analyse the data and get aggregated statistics. The dataset is available at: doi.org/10.5281/zenodo.4121430


INTRODUCTION
LoRaWAN is a wireless single-hop, long-range and Low-Power Wide Area Network (LPWAN). LoRaWAN has seen rapid adoption due to its communication coverage, ease of deployment, and simplified infrastructure management. Over 180 million LoRa-enabled devices 1 are already used for a variety of smart city and rural applications.
A significant body of academic work on LoRa and LoRaWAN has proposed improvements to data rate mechanism control [5] 1 https://www.semtech.com/lora DATA '20, November 16-19, 2020, Virtual Event, Japan © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in The 3rd International SenSys+BuildSys Workshop on Data: Acquisition to Analysis (DATA '20), November 16-19, 2020, Virtual Event, Japan, https://doi.org/10.1145/3419016.3431491. and solutions to decrease collisions and increase channel goodput [8], scheduling for reliable and efficient data collection [2], and scaling [7]. The validity and performance of the proposed improvements and solutions has been extensively tested in lab-like testbeds and simulations which have little in common with realworld environments and deployments. This is due to the absence of publicly available traces from the target environments. The IoT research community has recognised the relevance of quantitative evidence from real-world environments and deployments and has made available a few LoRaWAN datasets [1,4,6]. The fingerprinting dataset [1] contains over 120, 000 traces collected over a three month period. Each trace stores the location of the device, the Received signal Strength Indicator sampled by the gateways upon receiving a packet, the spreading factor used, and the time of the received packet. The fingerprinting dataset provides no gateway capacity, load, or deployment environment information. The LoRa underground link dataset [6] is collected in an agricultural environment. It contains data from a low-density LoRaWAN deployment with only five transmitters and two receiver base stations.
In this paper we present LoED, the LoRaWAN at the Edge Dataset, which consists of traces gathered in central London in a mix of dense urban and park environments. Overall, 11, 000, 000 packets (referred to as packets in this paper) were collected at nine gateways during 2019 and 2020. We describe LoED in Section 2, and briefly discuss how the dataset can be exploited in Section 3.

DATASET
Setup. LoED was acquired from nine LoRaWAN gateways in central London. The gateway locations were representative of typical dense urban and park environments and cover different deployment conditions as shown in Table 1.
Five outdoor gateways were deployed on the roof tops of large buildings, with a clear line-of-sight (LoS) to devices. Four indoor gateways were located near windows with limited LoS. One of the indoor gateway was placed on the ground floor of a college dormitory with no-LoS to any device. Each gateway forwarded received packets to a multiplexer which forwarded them to different Network Servers. Our server copied the packets and the gateway metadata to a time-series database. The gateway locations can be found in the dataset and  The LoED dataset is publicly available at [3] and includes: ) all packets received at the nine gateways, ) one pre-processed CSV data file for every day of the collection campaign. The files are saved in dd_mm_yyyy.csv format, ) a set of scripts for processing and plotting the data and, ) a preliminary analysis of the data. Preliminary insights.
LoED exposes insights into how LoRaWAN operates in realworld urban deployments and provides data such as: ) number of packets per day at a gateway, ) total number of packets per node, ) distribution of different packets types at gateways, ) distribution of frequencies used at gateways, ) distribution of spreading factors used at gateways, ) distribution of RSSI values at a gateway, ) distribution of SNR values at a gateway.
In Figure 1 we see the LoRa spreading factor usage at each gateway. Of note are the dominate use of spreading factors: 7, 8 (at two of the gateways) and 12. This may increase the probability of collisions as the number of devices using the same spreading factor rises. As a consequence, the performance of the applications running on the devices in the network may decrease and require further, lower-level, investigation.

DISCUSSION
The LoED dataset can be used by the LoRaWAN community for many purposes. It can be used to characterise parameter usage and highlight long-term trends for LoRaWAN applications and devices in an urban environment. LoED can provide test data to inform the design of scheduling algorithms and protocols, to ensure they are well-suited to the target applications and environments. Further, LoED can be plugged into capacity planning systems, which use different statistical and machine learning, to derive optimal parameters to improve network throughput, reduce interference or determine locations for new gateways to improve coverage.