Mobility Networked Time Series Benchmark Datasets
Contributors
Data curator:
Data manager:
Project leader:
Project members:
Description
Overview
Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709.
Benchmark Code
Go to Github: https://github.com/kaist-dmlab/MOBINS
Benchmark Baseline List
- Linear-based:
DLinear
,NLinear
- RNN-based:
SegRNN
- Transformer-based:
Informer
,Reformer
,PatchTST
- CNN-based:
TimesNet
- GNN-based:
STGCN
,MPNNLSTM
Detailed Benchmark Results
There is MOBINS_Results.pdf in the Github Link, the detailed benchmark results of MOBINS were reported with MAE, MSE, and standard deviation.
Code Licence
- Our code implementation is released under the MIT License
Code Reference
DLinear
: https://github.com/cure-lab/LTSF-LinearNLinear
: https://github.com/cure-lab/LTSF-LinearSegRNN
: https://github.com/lss-1138/SegRNNInformer
: https://github.com/zhouhaoyi/Informer2020Reformer
: https://github.com/lucidrains/reformer-pytorchPatchTST
: https://github.com/yuqinie98/PatchTSTTimesNet
: https://github.com/thuml/TimesNetSTGCN
: https://github.com/hazdzz/STGCNMPNNLSTM
: https://github.com/geopanag/pandemic_tgnn
Benchmark Datasets
Dataset Descriptions
Dataset | Locations | Spatial node units | Edges | Domain | Daily Movements | Daily Amounts | Time interval | Time Range | Frames | Target dimension |
---|---|---|---|---|---|---|---|---|---|---|
Transportation | Seoul | 128 | 290 | Station-based administrative area | SmartCard:2.68M | In/Out-flow:4.02M | 1 hour | 01/01/2022-12/31/2023 | 17520 | 16640 |
Busan | 60 | 121 | Station-based administrative area | SmartCard:0.63M | In/Out-flow:0.75M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3720 | |
Daegu | 61 | 123 | Station-based administrative area | SmartCard:0.10M | In/Out-flow:0.34M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3843 | |
NYC | 5 | 12 | Borough | Taxi:0.10M | Ridership:3.03M | 1 hour | 02/01/2022-03/31/2024 | 17280 | 30 | |
Epidemic | Korea | 16 | 45 | City&Province | SmartCards:13.41M | Infection:25834 | 1 day | 01/20/2020-08/31/2023 | 1320 | 272 |
NYC | 5 | 12 | Borough | Taxi:2418 | Infection:2038 | 1 day | 03/01/2020-12/31/2023 | 1401 | 30 |
Formats of datasets (MOBINS.zip)
csv format
datasets in every environment: each dataset has three components.SPATIAL_NETWORK.csv
: ( n∗n where n = # of nodes )- Column name list: INDEX, N0, N1, …, Nn
- INDEX list: N0, N1, …, Nn
NODE_TIME_SERIES_FEATURES.csv
: ( t * p ) * ( n * d ) where t = # of timestamps in a day, p = total period, and d = # of variables from time series- Column name list: datetime, N0 _{VARIABLE_NAME}, N1 _{VARIABLE_NAME}, …, Nn _{VARIABLE_NAME}
- VARIABLE_NAME list: Transportation-[Seoul, Busan, Deagu]} datasets (INFLOW, OUTFLOW), Transportation-NYC dataset (RIDERSHIP), Epidemic-[Korea, NYC] dataset (INFECTION)
OD_MOVEMENTS.csv
: ( t * p ) * ( n, n )- Column name list: N0 _ N0, N0 _ N1, N0 _ N2, … , Nn _ Nn−1 , Nn _ Nn
Meta datasets
In the Github Link, there is metadata for MOBINS_Meta.pdf.
Metadata for Transportation Datasets
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. We omit the detailed description in metadata for Transportation-[Busan, Daegu] because the CSV file structures are identical to the metadata for Transportation_Seoul, differing only in the number of nodes, which is unique to each dataset. Transportation_NYC follows a similar structure, with the exception of the variable for node time-series features (ridership).
Metadata for Epidemic Datasets
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. Both datasets share a consistent structure in terms of node time-series features, OD movements, and spatial networks.
Data Licence
- The
Transportation-[Seoul, Busan, Daegu, NYC]
andEpidemic-NYC
datasets are released under a CC BY-NC 4.0 International License. - The
Epidemic-Korea
datasets are released under a CC BY-NC-ND 4.0 International License.
How to Curate MOBINS
Composition
The MOBINS dataset collection consists of mobility networked time-series data for forecasting tasks in two domains: Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-[Korea, NYC]. Each dataset comprises three key components: (1) OD movements, (2) a spatial network, and (3) time series. These datasets capture the temporal evolution of OD movements and time series within a fixed spatial network. OD movements represent the volume of movements between pairs of nodes, while time series denotes the time-varying features within each node. These datasets provide a comprehensive understanding of mobility patterns, exhibiting high correlation and synergy between OD movements and time series.
Collection Process
All datasets in the MOBINS are collected from reliable sources, including government agencies, local governments, public transportation operators, and smart card companies. These sources provide publicly accessible data downloads based on their administrative systems. The source data from smart transit card information systems is accessed through API calls at the administrative area level, such as neighborhoods or provinces, to align the spatial resolution of the time series.
The use of data available on the Korea Public Data Portal is either unrestricted or covered by the CC BY license. For sources without a specific license indication, we obtained responses about the uses for research through inquiries via phone or email. Additionally, data from the Korea Disease Control and Prevention Agency was used without numerical value modifications after obtaining permission.
Preprocessing/Cleaning/Labeling
Each dataset in the MOBINS collection is derived from different sources for OD movements and time series. To ensure consistent spatial and temporal resolution, we align these two sources using Python. In the Transportation-[Seoul, Busan, Daegu] datasets, we use 'station-based administrative areas' as spatial node units, treating stations within the same administrative area as a single node. For the Transportation-NYC dataset, we use boroughs as spatial node units to align the spatial resolution between taxi zones and stations. In the Epidemic-Korea dataset, the source infection case data is collected at the city and province levels. Hence, we use OD movements based on the city and province levels to match spatial resolution. Similarly, for the \emph{Epidemic-NYC} dataset, we use corresponding OD movements at the borough level to maintain consistent spatial node units. After the spatial resolutions are determined, we generate the spatial network based on these resolutions.
Regarding the temporal aspect, although the source frequency of OD movements from Transportation-[Busan, Daegu, NYC] is less than 15 minutes, we set the frequency to 1 hour in the MOBINS to match the time-series data frequency. This integration of double sources with positive or negative correlations enables the interpretation and forecasting of data from various contextual perspectives.
Among our dataset collection, the source OD movements of the Transportation-Seoul dataset have 14 missing days (07/01/2022 -- 07/06/2022, 07/13/2022, 07/20/2022, 08/06/2022, 08/07/2022, 09/13/2022, 10/31/2022, 11/01/2022, and 12/04/2022) in the Korea Public Data Portal. These missing days are filled with additional OD movement information from the smart transit card information system. Meanwhile, source OD movements from the NYC taxi dataset contain abnormal taxi records. To provide clean NYC OD movements, we remove abnormal taxi records if the difference between drop-off and pick-up timestamps is less than 0 seconds or more than 6 hours for each record. To facilitate future data updates, we maintain backups of the raw source data.
Data Reference
-
References of Origin-Destination Movements
Transportation-Seoul
: Korea Public Data Portal and Smart Transit Card Information SystemTransportation-[Busan,Daegu]
: Smart Transit Card Information SystemTransportation-NYC
: NYC Taxi and Limousine Commission(TLC)Epidemic-Korea
: Smart transit card information systemEpidemic-NYC
: NYC Taxi and Limousine Commission(TLC)
-
References of Time Series
Transportation-Seoul
: Korea Public Data Portal (Seoul subway line 1-8 and line 9)Transportation-[Busan,Daegu]
: Korea Public Data Portal (Busan and Daegu)Transportation-NYC
: NYC Data PortalEpidemic-Korea
: Korea Disease Control and Prevention AgencyEpidemic-NYC
: NYC Health
[note] All source websites support the official English version except Smart Transit Card Information System
and Korea Disease Control and Prevention Agency
. Therefore, we write down how to contact or use two source datasets.
- Uses of
Smart Transit Card Information System
: Please contact this email (stcis@kotsa.or.kr). - Time Series of
Epidemic-Korea
: direct download link. If you want to contact the reference, please use this official English link.
we implemented our benchmark code based on Time Series Library (TSLib) .
- DLinear: https://github.com/cure-lab/LTSF-Linear
- NLinear: https://github.com/cure-lab/LTSF-Linear
- SegRNN: https://github.com/lss-1138/SegRNN
- Informer: https://github.com/zhouhaoyi/Informer2020
- Reformer: https://github.com/lucidrains/reformer-pytorch
- PatchTST: https://github.com/yuqinie98/PatchTST
- TimesNet: https://github.com/thuml/TimesNet
- STGCN: https://github.com/hazdzz/STGCN
- MPNNLSTM: https://github.com/geopanag/pandemic_tgnn
@inproceedings{na2025mobility,
title={Mobility Networked Time Series Benchmark Datasets},
author={Na, Jihye, and Nam, Youngeun, and Yoon, Susik and Song, Hwanjun and Lee, Byung Suk and Lee, Jae-Gil},
booktitle={ICWSM},
year={2025},
}
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2023R1A2C2003690).
Files
MOBINS.zip
Files
(253.7 MB)
Name | Size | Download all |
---|---|---|
md5:605bb95e35029299a18569941cc2b822
|
253.7 MB | Preview Download |