Wikidata beginners’ workshop

Abstract

The world’s knowledge at your fingertips! This beginners’ workshop introduces ways of gathering information using Wikidata and includes a brief overview of the Wikimedia foundation. The hands-on part starts with a general introduction to subject-predicate-object queries as implemented in a Resource Description Framework (RDF). RDF lies at the core of Wikidata and is widely used in academic contexts. It represents the standard Web technology upon which the concept of Linked Data is applied. In short, RDF is the state-of-the-art way to describe data with metadata.

In this workshop, you will learn to use the RDF query language SPARQL and perform semantic queries. We will retrieve information about various consortia of Germany’s Research Data Infrastructure (Nationale Forschungsdateninfrastruktur - NFDI). One of the goals of NFDI is to become Germany’s backbone in research data management. Consequently, the structures and services of NFDI have to be Findable, Accessible, Interoperable and Re-usable (FAIR) in a sustainable way. Hence, more and more information relating to NFDI is being added to Wikidata. You can find out in this workshop how to gather it and how to join the Wikidata community. While we will not delve into the subject of becoming a contributor to Wikidata too much, a clear picture of what kind of data on NFDI is already there and what’s missing will emerge.

A hallmark of this workshop is its use of Donald E. Knuth’s “Literate Programming” approach from the 1980s. In doing so, we leave aside the graphical user interface of Wikidata’s query builder and focus on actually writing down - typing - the code without assuming any prior experience with programming languages. Our method of teaching with two trainers and a split screen is born out of the necessities of teaching at a distance and has been developed and tested with different programming environments during the recent pandemic (DOI: 10.17192/bfdm.2021.3.8336).

As a matter of fact, Wikidata’s query builder is limited when it comes to generating geographical maps or visualizing lists of image items like, for example, logos of different consortia. Therefore, writing the queries directly with your fingertips comes with the added bonus of getting to know a wide range of possible visualizations. Visualizations in Wikidata can be easily exported to other platforms and can also be put on websites, where they keep updating depending on new data added.

Last, but not least, adding data to Wikidata is what makes this workshop possible in the first place. Wikidata is not built on a common database. Common, or more traditional, databases are relational and allow for a structured query language (SQL). Wikidata, on the contrary, offers linked datasets and allows for creating a knowledge graph. While SPARQL may initially look like SQL, there are important differences because the data is linked. With SPARQL, your query matches graph patterns instead of SQL’s relational matching operations. Using this language, one can perform a distributed or federated query across multiple databases with a single query statement at once. Enjoy!

License: CC-BY 4.0

Introduction

by Évariste Demandt

Outline

After today’s workshop you will …

Introduction “The Wikimedia Foundation”

logo

How is knowledge (in wikidata) organised?

Imagine a world, where you only have this list of information given:

- Judy has a cat, (and she) lives in a flat, which is small. - Melinda is friends with Jack's friends. - Tom lives in a house, which is in the city. - Peter is the son of Mary. - Robert is the father of Melinda, (and he) is married to Mary. - The flat is in the city. - The dog is colored black. - Jack has a dog. - Judy is the aunt of Peter. - Tom's hair is colored black. - Peter is a friend of Jack. - Jack is a friend of Melinda. - The cat is a pet.

Task

Discuss and answer following questions with the information given above.

Besides finding the answers to the questions. Discuss as well:

Graph(ic) data

Examples

has
Mary
Child
married
son of
mother of
father of
daughter of
aunt of
friend of
nephew of
has a
has a
lives in
lives in
black hair
city
cat
dog
small
Mary
Robert
Peter
Melinda
Judy
Jack
pet
pet
flat
Tom
house

Conclusion

General structure:

predicate
subject
object

The concept behind this is called Resource Description Framework (RDF).

Wikidata

How is wikidata structured?

Taking the example of Douglas Adams

wikidata-model

property
qualifier
item
item

How do we put data into wikidata?

Let’s insert some ORCiD-IDs to people in wikidata.

Everyone picks one person from https://pad.otc.coscine.dev/sCLyqrk2T0yA_KTGv3DWFg?view (write name in column).

  1. Search person
  2. Go to identifier section
  3. chosse “ORCID” for property, insert ORCID-ID

How to get information from wikidata?

Let’s look at an entry: National Research Data Infrastructure

Formulate the sentence:

Which consortium (who) is part of the NFDI?

is part of
Who?
NFDI

Coding with SPARQL

To get results from a RDF knowledge-graph database we use SPARQL Protocol and RDF Query Language.

SELECT ?x WHERE { # subject predicate object ?x p q. }

The SELECT clause lists variables.
The WHERE clause contains restrictions.

—> Switch to Wikidata Query Service!
Let’s try to write the query from above using SPARQL and the
Wikidata Query Service.

SELECT ?consortium WHERE { #who? part of NFDI ?consortium wdt:P361 wd:Q61658497. }

The result may not be satisfying…

Adding a column with the label:

SELECT ?consortium ?consortiumLabel WHERE { #who? part of NFDI ?consortium wdt:P361 wd:Q61658497. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } }

Hint: Type SERV press CTRL+SPACE and select an auto-completion.

Do we agree with this list?
Do we actually need to code?

Using the SPARQL Query Helper

SELECT ?consortium ?consortiumLabel WHERE { #who? part of NFDI ?consortium wdt:P361 wd:Q61658497. #who? instance of accpted NFDI consortium ?consortium wdt:P31 wd:Q98270496. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } }

Creating different exports

Coordinates on a map

But do we know where they are actually located at?
Adding now some coordinates:

#defaultView:Map SELECT ?consortium ?consortiumLabel ?affiliation ?affiliationLabel ?geo WHERE { ?consortium wdt:P361 wd:Q61658497. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } ?consortium wdt:P31 wd:Q98270496. OPTIONAL { ?consortium wdt:P1416 ?affiliation. } OPTIONAL { ?affiliation wdt:P625 ?geo .} }

Now, everyone can export the result and insert it into an iframe of the blank/helper pad. As an aside: You may choose the short URL and embed this into the iframe:

<iframe width=100% height=400pt src="https://w.wiki/5ug4"> </iframe>

network visualisation

Show map from above as graph with colored labels.

query
#defaultView:Graph SELECT ?consortium ?consortiumLabel ("EC0000" AS ?rgb) ?affiliation ?affiliationLabel ?geo WHERE { ?consortium wdt:P361 wd:Q61658497. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } ?consortium wdt:P31 wd:Q98270496. OPTIONAL { ?consortium wdt:P1416 ?affiliation. } OPTIONAL { ?affiliation wdt:P625 ?geo .} }

Fancy features

export results as code

You can export the result list from Wikidata to e.g. this pad:

You get a code that looks like this when choosing the “embed result” option

<style="width: 40vw; height: 50vh; border: none;" src="![](http://pad.otc.coscine.dev/uploads/742a94ed-b08f-4c2e-8d14-843441466cc6.png) " referrerpolicy="origin" sandbox="allow-scripts allow-same-origin allow-popups" > </iframe>

Which will be displayed as

embedding queries

To query wikidata you do not need to use the query builder but you can do that from the program/script.

Click on </> code in the line above the result area.

Then you can select from various languages/scripts to implement the query in your workflow; below there is the example for R.

Useful links about today’s topic

Tools

Acknowledgement

The authors would like to thank the Wikidata community and, particularly, Ceren Yildiz and Amar Suljkić for adding data about NFDI to Wikidata. This workshop builds on the joint efforts of WikiProject NFDI.

Lukas C. Bossert thanks CRC1382 for being the pilote group testing this wikidata workshop. Funded by Deutsche Forschungsgesellschaft (DFG, German Research Foundation) – Project-ID 403224013 – SFB 1382.

Évariste Demandt thanks the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.