Published November 18, 2025
| Version v1
Software
Open
Efficient Construction of Heterogeneous Scientific Battery Databases via a Distilled Dual-Model Framework
Authors/Creators
Contributors
Editor:
Description
This Python script automates the extraction of battery experimental data from PDF files using AI models. It employs a two-stage approach: first classifying whether a document is battery-related, then extracting detailed experimental conditions.
- Automated PDF Processing: Supports both PyPDF2 and PyMuPDF for robust PDF text extraction
- Two-Stage AI Pipeline:
- Classification model to identify battery-related documents
- Extraction model to extract detailed experimental conditions
- OpenAI-Compatible API: Works with DashScope and ModelScope APIs
- Comprehensive Statistics: Tracks processing time, token usage, and error rates
- JSON Output: Structured data output for easy integration
- Retry Mechanism: Automatic retry for failed API calls
Files
README.md
Files
(42.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:20d3393030b3b154d39e24d9c91f8795
|
37.4 kB | Download |
|
md5:9a12c662da895018d0e297a755756320
|
4.7 kB | Preview Download |
Additional details
Dates
- Other
-
2025-11-18