Efficient Construction of Heterogeneous Scientific Battery Databases via a Distilled Dual-Model Framework

Wang, Yiduo

doi:10.5281/zenodo.17638679

Published November 18, 2025 | Version v1

Software Open

Efficient Construction of Heterogeneous Scientific Battery Databases via a Distilled Dual-Model Framework

Wang, Yiduo

Contributors

Editor:

Wang, Yiduo

Battery Data Extraction from PDF Literature

This Python script automates the extraction of battery experimental data from PDF files using AI models. It employs a two-stage approach: first classifying whether a document is battery-related, then extracting detailed experimental conditions.

Features

Automated PDF Processing: Supports both PyPDF2 and PyMuPDF for robust PDF text extraction
Two-Stage AI Pipeline:
- Classification model to identify battery-related documents
- Extraction model to extract detailed experimental conditions
OpenAI-Compatible API: Works with DashScope and ModelScope APIs
Comprehensive Statistics: Tracks processing time, token usage, and error rates
JSON Output: Structured data output for easy integration
Retry Mechanism: Automatic retry for failed API calls

Files

README.md

Files (42.1 kB)

Name	Size	Download all
dual_model.py md5:20d3393030b3b154d39e24d9c91f8795	37.4 kB	Download
README.md md5:9a12c662da895018d0e297a755756320	4.7 kB	Preview Download

Additional details

Other: 2025-11-18

	All versions	This version
Views	25	25
Downloads	4	4
Data volume	84.1 kB	84.1 kB

Efficient Construction of Heterogeneous Scientific Battery Databases via a Distilled Dual-Model Framework

Authors/Creators

Contributors

Editor:

Description

Battery Data Extraction from PDF Literature

Features

Files

README.md

Files (42.1 kB)

Additional details

Dates