BiblioBlitz v4.1.0: Software for Automated Bibliographic Data Retrieval and Processing

Poddar, Ayanava; Bhattacharjee, Subhrajyoti

doi:10.5281/zenodo.20573666

Published June 6, 2026 | Version v4.1

Software Restricted

BiblioBlitz v4.1.0: Software for Automated Bibliographic Data Retrieval and Processing

Release Notes — BiblioBlitz v4.1

Release date: June 2026
Type: Feature release + bug fixes

What's New in v4.1

Multi-Country Query Fan-Out

BiblioBlitz now accepts multiple countries simultaneously. Each selected country generates a separate API query, and results are merged and deduplicated before the download pipeline runs. This substantially increases recall for cross-national or global literature searches.

State/Region Sub-Filter

A cascading state selector unlocks after country selection. Administrative divisions are loaded from a local states.csv file; if the selected country is not found locally, the app falls back to a live web lookup. State names are appended as title-search terms to narrow geographic relevance.

Three-Tier PDF Resolution

The download pipeline now attempts PDF resolution in a defined priority order:

Unpaywall — queried first for all records with a DOI; returns verified, publisher-permitted PDF URLs
Semantic Scholar — queried as a secondary fallback when Unpaywall returns nothing
Direct source URLs — used only for OpenAlex and CORE records, where links reliably point to actual PDFs (CrossRef direct links are excluded as they frequently redirect to paywalled HTML)

This replaces the previous behavior where CrossRef's link[] URLs were used first, causing a high rate of corrupted (HTML-disguised) files.

Post-Download Integrity Sweep

After every download session, BiblioBlitz scans all .pdf files in the output directory, checks that each file begins with the %PDF magic bytes and is above a minimum size threshold, and permanently removes any file that fails. The log reports the count of verified and deleted files.

Journal Filter: Substring Matching

The journal name filter previously required an exact string match between the selected venue name and the API-returned journal field. This caused most records to be silently dropped when the same journal appeared under slightly different names across APIs (e.g. "Catena" vs "CATENA (Elsevier)"). The filter now uses bidirectional substring matching, which correctly handles these variants.

Statistics Tab: Thread-Safe Chart Rendering

Chart generation in Tab 2 (Live Trend Analysis) now correctly separates data fetching (worker thread) from chart rendering (main UI thread). Previously, FigureCanvasTkAgg was being instantiated inside a background thread, causing UI freezes and occasional crashes. Charts are now handed off to the main thread via self.after(0, ...).

Journal Selector: Debounced Filter

The journal selection dialog previously rebuilt all checkbox widgets on every keypress in the filter box. With large journal lists (500+ venues), this caused noticeable lag. The filter now debounces at 200ms and skips re-rendering if the query string has not changed since the last render.

Flat Journal Display

The journal selector now displays venues as flat Publisher :: Journal entries rather than a two-level publisher/journal tree. This makes keyboard filtering faster and the selection more predictable.

Bug Fixes

Fixed corrupted PDF files caused by CrossRef link[] URLs returning HTML paywall pages instead of PDF bytes
Fixed low download counts caused by exact-match journal filtering dropping valid records with variant name formats
Fixed UI freeze when trend charts were rendered from a background thread
Fixed journal selector lag caused by full widget rebuild on every keypress
Fixed app icon not appearing in the Windows taskbar (now uses wm_iconbitmap with .ico format and a deferred after(250) call)

Download Statistics Context

Records fetched by the API layer (often 20,000+) are much larger than records that pass through to the download pipeline. This is expected behavior — the journal filter, deduplication, and the max_results cap together reduce the pool to the most relevant targets. The max_results slider in the UI controls the ceiling for this final set.

The proportion of records where a PDF is successfully downloaded depends on open-access availability of the literature, which varies significantly by field and publication venue. Metadata for all records (including paywalled ones) is preserved in download_log.csv.

Known Limitations

CORE API access without an API key is rate-limited and may return incomplete results for high-volume queries
PubMed DOI coverage is inconsistent; some records lack DOIs and cannot be resolved through Unpaywall or Semantic Scholar
The states.csv file must be present for reliable sub-national filtering; the web fallback returns limited data for many countries

Dependencies

customtkinter
matplotlib

Standard library only otherwise (urllib, csv, re, threading, tkinter).

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/20573666">Log in</a> to check if you have access.

Additional details

Is supplement to: Software: https://github.com/Ayanava-23556003/BiblioBlitz/tree/v4.1 (URL)

Repository URL: https://github.com/Ayanava-23556003/BiblioBlitz

	All versions	This version
Views	58	10
Downloads	7	1
Data volume	73.3 MB	2.6 MB

BiblioBlitz v4.1.0: Software for Automated Bibliographic Data Retrieval and Processing

Authors/Creators

Description

Release Notes — BiblioBlitz v4.1

What's New in v4.1

Multi-Country Query Fan-Out

State/Region Sub-Filter

Three-Tier PDF Resolution

Post-Download Integrity Sweep

Journal Filter: Substring Matching

Statistics Tab: Thread-Safe Chart Rendering

Journal Selector: Debounced Filter

Flat Journal Display

Bug Fixes

Download Statistics Context

Known Limitations

Dependencies

Files

Restricted

Additional details

Related works

Software