There is a newer version of the record available.

Published July 21, 2025 | Version WikiTextGraph - JORS v1.0.0
Software Open

WikiTextGraph: A Python Tool for Parsing Multilingual Wikipedia Text and Graph Extraction

  • 1. ROR icon Donostia International Physics Center
  • 2. CulturePlex Lab, Western University, London, Ontario, Canada
  • 3. ROR icon Consejo Superior de Investigaciones Científicas

Description

WikiTextGraph is an open-source Python package designed to extract and process text from Wikipedia dumps and construct internal link networks across multiple language editions. It uses efficient parsing, redirect resolution, and multilingual graph-building techniques to tackle the challenges of Wikipedia’s scale, structure, and inherent noise. With a modular architecture and a simple graphical user interface (GUI), it is suitable for both technical and non-technical users. Built for scalability and reproducibility, WikiTextGraph supports interdisciplinary research in network science, computational linguistics, and digital humanities. Its flexible design enables easy adaptation for tasks involving low-resource or cross-lingual language studies.[1]

Files

README.md

Files (58.2 kB)

Name Size Download all
md5:e22b744f4a9b09da09d4be511c3b8714
2.3 kB Download
md5:afd478c6482908ef5de2a8649b3acab1
7.3 kB Download
md5:21b9eea6c4f8aa9fae8d766c643bd61a
8.5 kB Download
md5:933d35cc0c8f79bbfa1330ffc19033cb
6.5 kB Download
md5:86d3f3a95c324c9479bd8986968f4327
11.4 kB Download
md5:9546c0faab48f5e80cdc78181744ba2d
7.3 kB Download
md5:5bfc01dc9efde70c33b63a3cbed868d4
7.0 kB Preview Download
md5:009d00c94160a431388a6cf0e4a05757
649 Bytes Preview Download
md5:ca80d0b9a9f6b6db91e087293b8bba9a
4.6 kB Download
md5:c5c1f3345d8bbb5cc491e88848b2630d
2.7 kB Download

Additional details

Software

Repository URL
https://github.com/PaschalisAg/WikiTextGraph
Programming language
Python
Development Status
Active