SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles
Description
SciHyp is a dataset that supports researchers in understanding and identifying hypotheses in scientific literature, serving as a valuable resource across various scientific disciplines. SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines.
This repository contains the ontology and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles.
🚨 Important Update: The ontology and RDF data files have been updated and are now available on GitLab. The latest version will be uploaded here soon.
crowd.ttl: The data in this file has been curated utilizing the SciHyp pipeline, which employs a Hybrid-LLM-Crowd methodology described in the paper.
expert.ttl: Contrasting the crowd-sourced data, this file is composed of data curated through expert annotation. It reflects a more specialized and precise perspective, offering insights grounded in expert knowledge and analysis.
scihyp_VoiD.ttl: Serving as a metadata file containing a VoiD/DCAT description (Vocabulary of Interlinked Datasets/Data Catalog Vocabulary).
CrowdAlytics_7.4.owl: This file is the backbone of the dataset, outlining the underlying ontology that defines the structure and relationships within the SciHyp data.
You can find a detailed description of the data and other resources here.
SPARQL Endpoint
For querying the current SciHyp dataset, you can use our SPARQL endpoint. This endpoint allows you to execute SPARQL queries to explore and extract data from the SciHyp dataset interactively.
SPARQL Endpoint URL: https://crowdalytics.ifi.uzh.ch/sparql/dataset.html
Below is an example query to retrieve some annotations.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ca: <http://ddis.ifi.uzh.ch/ontologies/2021/crowdalytics#>
PREFIX disk: <http://disk-project.org/ontology/disk#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sqo: <https://w3id.org/sqo#>
SELECT ?hypothesis (GROUP_CONCAT(DISTINCT ?independentVariableText; separator=" ") AS ?independentVar)
(GROUP_CONCAT(DISTINCT ?dependentVariableText; separator=" ") AS ?dependentVar)
?leftGroup ?rightGroup ?relation_Operator
WHERE {
?TriggeredLineOFInquiry disk:hasLineOfInquiry ?LineOfInquiryURI .
?LineOfInquiryURI disk:hasHypothesisQuery ?hypothesis .
?LineOfInquiryURI disk:hasQuestion ?question .
OPTIONAL {
?question sqo:hasQuestionVariable ?questionVariable .
?questionVariable ca:typeOfVariable ?vartype .
?questionVariable ca:hasHypothesisVariableText ?vartext .
?LineOfInquiryURI ca:hasRelation ?relation_Operator .
BIND(IF(STR(?vartype) = "Independent Variable", ?vartext, "") AS ?independentVariableText)
BIND(IF(STR(?vartype) = "Dependent Variable", ?vartext, "") AS ?dependentVariableText)
}
OPTIONAL{
?question ca:hasGroupPair ?groupPair .
?groupPair ca:hasLeftGroup ?leftGroup .
?groupPair ca:hasRightGroup ?rightGroup .
?LineOfInquiryURI ca:hasOperator ?relation_Operator .
}
}
GROUP BY ?hypothesis ?leftGroup ?rightGroup ?relation_Operator
Note: You can find more example queries here.
Files
Files
(1.3 MB)
Name | Size | Download all |
---|---|---|
md5:2ea98266da133ab86508f59c615ab7fc
|
761.7 kB | Download |
md5:33f3dda2485b5853281203c7e515dd6e
|
33.3 kB | Download |
md5:2be1a3c7db16749d76584dc768fb74d8
|
483.7 kB | Download |
md5:2064916a68b0b9e4462a9ce5e7001602
|
1.2 kB | Download |
Additional details
Funding
- Swiss National Science Foundation
- CrowdAlytics: Large-Scale Human-Machine Systems for Data Science 184994
Software
- Repository URL
- https://gitlab.ifi.uzh.ch/DDIS-Public/scihyp/
- Development Status
- Active