Published November 14, 2025 | Version v1
Dataset Open

MajinBook: An open catalogue of digital world literature with likes

  • 1. ROR icon École Normale Supérieure - PSL
  • 2. ROR icon Langues, Textes, Traitements Informatiques, Cognition
  • 3. Institut Prairie
  • 4. CNRS

Description

Preprint: arxiv.org/abs/2511.11412

Data documentation: github.com/mazieres/MajinBook

Abstract

MajinBook is an open catalogue designed to facilitate the use of shadow libraries—such as Library Genesis and Z-Library—for computational social science and cultural analytics. By linking metadata from these vast, crowd-sourced archives with structured bibliographic data from Goodreads, we create a high-precision corpus of over 539,000 references to English-language books spanning three centuries, enriched with first publication dates, genres, and popularity metrics like ratings and reviews. Our methodology prioritizes natively digital EPUB files to ensure machine-readable quality, while addressing biases in traditional corpora like HathiTrust, and includes secondary datasets for French, German, and Spanish. We evaluate the linkage strategy for accuracy, release all underlying data openly, and discuss the project's legal permissibility under EU and US frameworks for text and data mining in research. 

 

Files

Files (6.6 GB)

Name Size Download all
md5:d22503af812b9c5112e4f8e17190c3b5
90.6 MB Download
md5:6d4574c43429b7ee478b6d018d3a6402
1.3 GB Download
md5:657dd22617880f0224942d7cb607f413
262.9 MB Download
md5:8feb79c2e1e73042b71fbc42fc000c40
187.6 MB Download
md5:01c65eee8a332856245787c33753c807
4.1 MB Download
md5:d3ff32bb3b52de49213c83256e8fe7c5
67.7 MB Download
md5:361326e0180c051c21eb5721db2eae89
6.1 MB Download
md5:ae987c4a54c6f0477ae78aecdf5f7742
4.7 MB Download
md5:f9eaa2770d895c5b807cc871ddeac593
4.3 GB Download
md5:5876dc0cb3ef4be79a35fbaf59d5bcbe
287.4 MB Download

Additional details

Related works

Is described by
Preprint: arXiv:2511.11412 (arXiv)

Funding

Agence Nationale de la Recherche
PRAIRIE-PSL ANR-22-CMAS-0007