High-Frequency Water Quality Time-Series Dataset for WAWQI Forecasting

1. Overview
This dataset contains high-frequency, multivariate time-series data collected from an active freshwater aquaculture lake at Telkom University, Bandung, Indonesia. The data was recorded using a multi-sensor monitoring node deployed over a four-day period.

The primary purpose of this dataset is to facilitate research in short-term temporal forecasting of the Weighted Arithmetic Water Quality Index (WAWQI) using machine learning algorithms.

2. Dataset Structure
The file "aquaculture_wawqi_dataset.csv" contains 23,502 rows (excluding the header) and 8 columns. Data was recorded continuously at 15-second intervals.

Sensor Variables:
1. Temperature_C: Water temperature in degrees Celsius (C).
2. pH: Potential of Hydrogen (acid-base balance).
3. DO_mgL: Dissolved Oxygen in milligrams per liter (mg/L).
4. Turbidity_NTU: Water turbidity in Nephelometric Turbidity Units (NTU).
5. EC_mScm: Electrical Conductivity in milliSiemens per centimeter (mS/cm).
6. TDS_mgL: Total Dissolved Solids in milligrams per liter (mg/L).
7. ORP_mV: Oxidation-Reduction Potential in millivolts (mV).
8. WAWQI_Score: The pre-computed Weighted Arithmetic Water Quality Index score based on the seven physical-chemical parameters, provided for direct use as a forecasting target.

3. Preprocessing Notes for Reproducibility
This dataset is provided in a semi-raw format (missing values caused by transient transmission failures have been resolved via linear interpolation).

If you are attempting to reproduce the exact 23,453-row dataset used in our baseline predictive modeling research, you must apply the following temporal trimming steps to remove sensor stabilization and retrieval artifacts:
- Drop the first 36 rows (sensor stabilization phase).
- Drop the last 13 rows (sensor retrieval phase).

4. Usage and Citation
This dataset is open access and distributed under the Creative Commons Attribution 4.0 International license. If you use this dataset in your research, please cite the corresponding paper:
"High-Frequency WAWQI Forecasting Using Bayesian-Optimized Machine Learning and Hybrid Time-Series Features"
