Published October 11, 2022 | Version v1
Lesson Open

Enabling Scalable and Reliable Real Time Data Services for Sensors and Devices in StreamCI

Description

Rapid advances in technology over the past decade have enabled collection of large amounts of data, in particular, through data streams from sensors and devices. Effective utilization of such data has been hampered by the lack of ready-to-use resources for data providers to manage the data and for data consumers to access the data through facile APIs. In this paper, we present StreamCI, a scalable cloud-based sensor data collection and analysis system that enables researchers to easily collect, process, store, and access large volumes of heterogeneous sensor data. StreamCI provides a web portal for users and administrators to easily register new data sources and monitor the status of data ingestion pipelines. The back end of StreamCI provides real time data ingestion/query APIs, data access control, and data processing pipelines using an open source software stack including RabbitMQ, Node.js, MongoDB, Certbot, Grafana, and HUBzero. Containerization and orchestration of services using Kubernetes improves scalability, as demonstrated by our experimental results. The StreamCI system has been used in multiple research and education domains including the collection and processing of plant health sensor data by plant phenotype researchers, collection of real-world air quality sensor data and its use in data analysis coursework for ecology students, and the collection and analysis of advanced manufacturing data for cybersecurity research.

Files

Gateways2022_paper_8125.pdf

Files (1.2 MB)

Name Size Download all
md5:cd67d7c5c7f805a24bc724a32bce0531
1.2 MB Preview Download