Published November 18, 2025 | Version v1
Software Open

Efficient Construction of Heterogeneous Scientific Battery Databases via a Distilled Dual-Model Framework

Authors/Creators

Contributors

Editor:

Description

Battery Data Extraction from PDF Literature

This Python script automates the extraction of battery experimental data from PDF files using AI models. It employs a two-stage approach: first classifying whether a document is battery-related, then extracting detailed experimental conditions.

Features

  • Automated PDF Processing: Supports both PyPDF2 and PyMuPDF for robust PDF text extraction
  • Two-Stage AI Pipeline:
    • Classification model to identify battery-related documents
    • Extraction model to extract detailed experimental conditions
  • OpenAI-Compatible API: Works with DashScope and ModelScope APIs
  • Comprehensive Statistics: Tracks processing time, token usage, and error rates
  • JSON Output: Structured data output for easy integration
  • Retry Mechanism: Automatic retry for failed API calls

Files

README.md

Files (42.1 kB)

Name Size Download all
md5:20d3393030b3b154d39e24d9c91f8795
37.4 kB Download
md5:9a12c662da895018d0e297a755756320
4.7 kB Preview Download

Additional details

Dates

Other
2025-11-18