Published January 21, 2026 | Version v1
Software Open

Code for : Realistic Multi-Fault Diagnostics of Millions-Scale Li-ion batteries with Rapid Unsupervised Learning

Authors/Creators

  • 1. ROR icon Harbin Institute of Technology

Description

Code for : Realistic Multi-Fault Diagnostics of Millions-Scale Li-ion batteries with Rapid Unsupervised Learning

 

Abstract 

 
The rapid deployment of battery swapping stations necessitates scalable and reliable fault diagnosis, yet massive, sparse operational data and scarce labeled samples make this challenging. Here, we report a rapid unsupervised learning framework for realistic  multi-fault diagnosis in million-scale battery fleets. Our approach employs a double-layer mechanism. First, we rapidly screen for abnormal devices by extracting features from voltage-envelope sequences. Subsequently, we pinpoint faulty cells and types using an enhanced two-stage unsupervised clustering combined with rule-based fault tracing. The framework is validated on a production dataset of over 128,000 devices, achieving 97.33% device-layer and 99.66% cell-layer accuracy. Laboratory tests on recalled batteries further confirm the detection of low-capacity and micro-short-circuit faults. These results demonstrate scalability and robustness under sparse-data conditions, enabling reliable operations for large-scale energy storage systems.

Description

This project provides an implementation of a diagnostic framework with the following workflow:
1. Data: Partial raw data samples are stored in the `data/` folder.  
2. processedData: Cleaned and transformed data are placed in `processedData/`.  
3. Code: All scripts and notebooks for model constructing, parameters tuning, performance evaluating, and results analyzing are in `code/`.  
4. Results: Outputs from the framework are saved in `Result/`.

Usage

1. Navigate to the `code/` folder.  
2. Run the `dataProcess.ipynb` and `dataReorganization.ipynb` scripts in turn to transform raw data into processed data.  
3. Execute `processPredefinedDtaset.ipynb` for predefined dataset while `processFullDataset.ipynb` for full dataset to generate results.  
4. Find outputs in the `Result/` folder.  

Requirements

- Python 3.8+  
- Common libraries: `numpy`, `pandas`, `scikit-learn`, `matplotlib`,`seaborn` (add others as needed).  


Files

readme.md

Files (857.3 kB)