The world’s knowledge at your fingertips! This beginners’ workshop introduces ways of gathering information using Wikidata and includes a brief overview of the Wikimedia foundation. The hands-on part starts with a general introduction to subject-predicate-object queries as implemented in a Resource Description Framework (RDF). RDF lies at the core of Wikidata and is widely used in academic contexts. It represents the standard Web technology upon which the concept of Linked Data is applied. In short, RDF is the state-of-the-art way to describe data with metadata.
In this workshop, you will learn to use the RDF query language SPARQL and perform semantic queries. We will retrieve information about various consortia of Germany’s Research Data Infrastructure (Nationale Forschungsdateninfrastruktur - NFDI). One of the goals of NFDI is to become Germany’s backbone in research data management. Consequently, the structures and services of NFDI have to be Findable, Accessible, Interoperable and Re-usable (FAIR) in a sustainable way. Hence, more and more information relating to NFDI is being added to Wikidata. You can find out in this workshop how to gather it and how to join the Wikidata community. While we will not delve into the subject of becoming a contributor to Wikidata too much, a clear picture of what kind of data on NFDI is already there and what’s missing will emerge.
A hallmark of this workshop is its use of Donald E. Knuth’s “Literate Programming” approach from the 1980s. In doing so, we leave aside the graphical user interface of Wikidata’s query builder and focus on actually writing down - typing - the code without assuming any prior experience with programming languages. Our method of teaching with two trainers and a split screen is born out of the necessities of teaching at a distance and has been developed and tested with different programming environments during the recent pandemic (DOI: 10.17192/bfdm.2021.3.8336).
As a matter of fact, Wikidata’s query builder is limited when it comes to generating geographical maps or visualizing lists of image items like, for example, logos of different consortia. Therefore, writing the queries directly with your fingertips comes with the added bonus of getting to know a wide range of possible visualizations. Visualizations in Wikidata can be easily exported to other platforms and can also be put on websites, where they keep updating depending on new data added.
Last, but not least, adding data to Wikidata is what makes this workshop possible in the first place. Wikidata is not built on a common database. Common, or more traditional, databases are relational and allow for a structured query language (SQL). Wikidata, on the contrary, offers linked datasets and allows for creating a knowledge graph. While SPARQL may initially look like SQL, there are important differences because the data is linked. With SPARQL, your query matches graph patterns instead of SQL’s relational matching operations. Using this language, one can perform a distributed or federated query across multiple databases with a single query statement at once. Enjoy!
License: CC-BY 4.0 ↗
by Évariste Demandt
After today’s workshop you will …
Imagine a world, where you only have this list of information given:
- Judy has a cat,
(and she) lives in a flat, which is small.
- Melinda is friends with Jack's friends.
- Tom lives in a house, which is in the city.
- Peter is the son of Mary.
- Robert is the father of Melinda,
(and he) is married to Mary.
- The flat is in the city.
- The dog is colored black.
- Jack has a dog.
- Judy is the aunt of Peter.
- Tom's hair is colored black.
- Peter is a friend of Jack.
- Jack is a friend of Melinda.
- The cat is a pet.
Discuss and answer following questions with the information given above.
- Who is a friend of Melinda?
- Who lives in the city?
- Who are the children of Mary?
- Who has a pet?
- Who is the grandfather of Tom?
- What are the colors of the pets?
Besides finding the answers to the questions. Discuss as well:
- What are potential problems with the list of information above?
General structure:
The concept behind this is called Resource Description Framework (RDF).
Taking the example of Douglas Adams
Let’s insert some ORCiD-IDs to people in wikidata.
Everyone picks one person from https://pad.otc.coscine.dev/sCLyqrk2T0yA_KTGv3DWFg?view (write name in column).
Let’s look at an entry: National Research Data Infrastructure
Formulate the sentence:
Which consortium (who) is part of the NFDI?
To get results from a RDF knowledge-graph database we use SPARQL Protocol and RDF Query Language.
SELECT ?x
WHERE
{
# subject predicate object
?x p q.
}
The SELECT clause lists variables.
The WHERE clause contains restrictions.
wd: (Q number)wdt: (P number)—> Switch to Wikidata Query Service!
Let’s try to write the query from above using SPARQL and the
Wikidata Query Service.
SELECT ?consortium
WHERE {
#who? part of NFDI
?consortium wdt:P361 wd:Q61658497.
}
The result may not be satisfying…
Adding a column with the label:
SELECT ?consortium ?consortiumLabel
WHERE {
#who? part of NFDI
?consortium wdt:P361 wd:Q61658497.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Hint: Type SERV press CTRL+SPACE and select an auto-completion.
Do we agree with this list?
Do we actually need to code?
Using the SPARQL Query Helper
SELECT ?consortium ?consortiumLabel
WHERE {
#who? part of NFDI
?consortium wdt:P361 wd:Q61658497.
#who? instance of accpted NFDI consortium
?consortium wdt:P31 wd:Q98270496.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
But do we know where they are actually located at?
Adding now some coordinates:
#defaultView:Map
SELECT ?consortium ?consortiumLabel ?affiliation ?affiliationLabel ?geo WHERE {
?consortium wdt:P361 wd:Q61658497.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
?consortium wdt:P31 wd:Q98270496.
OPTIONAL { ?consortium wdt:P1416 ?affiliation. }
OPTIONAL { ?affiliation wdt:P625 ?geo .}
}
Now, everyone can export the result and insert it into an iframe of the blank/helper pad. As an aside: You may choose the short URL and embed this into the iframe:
<iframe
width=100%
height=400pt
src="https://w.wiki/5ug4">
</iframe>
Show map from above as graph with colored labels.
#defaultView:Graph
SELECT ?consortium ?consortiumLabel ("EC0000" AS ?rgb) ?affiliation ?affiliationLabel ?geo WHERE {
?consortium wdt:P361 wd:Q61658497.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
?consortium wdt:P31 wd:Q98270496.
OPTIONAL { ?consortium wdt:P1416 ?affiliation. }
OPTIONAL { ?affiliation wdt:P625 ?geo .}
}
You can export the result list from Wikidata to e.g. this pad:

You get a code that looks like this when choosing the “embed result” option
<style="width: 40vw; height: 50vh; border: none;"
src="
"
referrerpolicy="origin"
sandbox="allow-scripts allow-same-origin allow-popups" >
</iframe>
Which will be displayed as
To query wikidata you do not need to use the query builder but you can do that from the program/script.
Click on </> code in the line above the result area.

Then you can select from various languages/scripts to implement the query in your workflow; below there is the example for R.

Query Builder (GUI): If you prefer using a graphical interface to write querries (limited functionalities!). Some queries are not possible to do with the query builder, especially when you want to get a map or a list of images as result.
List of properties in Wikidata: Search for a property
The authors would like to thank the Wikidata community and, particularly, Ceren Yildiz and Amar Suljkić for adding data about NFDI to Wikidata. This workshop builds on the joint efforts of WikiProject NFDI.
Lukas C. Bossert thanks CRC1382 for being the pilote group testing this wikidata workshop. Funded by Deutsche Forschungsgesellschaft (DFG, German Research Foundation) – Project-ID 403224013 – SFB 1382.
Évariste Demandt thanks the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.