Complete List of Mathematical Expressions in all Wikimedia Projects, including Wikipedia
Authors/Creators
Description
This dataset contains a deduplicated list of all mathematical expressions used in all wikimedia projects. The data is provided as json file where the key is the md5 hash of the input. The input is what was extracted from the wikitext sources. This was done in the following way:
- All current dump were filtered for the math tag (see https://doi.org/10.5281/zenodo.15107679) for details
- Those dumps were imported into a mediawiki installation with the MathSearch extension. Here one database was used per wiki.
- The data from all the mathlog tables were combined in one table, which was exported into a json file. The json contains a list of key value pairs where the keys are the md5 hashes of the input.
The scripts are available from
swh:1:cnt:faec2206a154db5a2711791f4211097e36bf1413; origin=https://github.com/MaRDI4NFDI/wikiFilter; visit=swh:1:snp:28ed43d0e16ca3d6ce4bad1b484cec9d1124cd48; anchor=swh:1:rev:855735a5c90a0db3ccfd20c3899af4c82bc6704f; path=/wmcloud/allFormulae.sql
Example: The Wikipedia article on mass energy equivalence contains the following wikitext
<math qid=Q35875>E = mc^2</math>
the MathSearch extension extracts the user input
E = mc^2
the md5 hash is
281a70c20b16a38d7781189936e1ac9f
and thus the row
"281a70c20b16a38d7781189936e1ac9f": "E = mc^2",
in the json file corresponds to that input.
Notes (English)
Files
wmf_texvc_inputs.json
Files
(322.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d1813da95a6915ea75bc1f614b9eb846
|
322.1 MB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.15107679 (DOI)
Funding
- Deutsche Forschungsgemeinschaft
- MaRDI 460135501