Published April 10, 2024 | Version V2

Dataset Open

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

1. University of Zurich

SciHyp is a dataset that supports researchers in understanding and identifying hypotheses in scientific literature, serving as a valuable resource across various scientific disciplines. SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines.

This repository contains the ontology and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles.

🚨 Important Update: The ontology and RDF data files have been updated and are now available on GitLab. The latest version will be uploaded here soon.

crowd.ttl: The data in this file has been curated utilizing the SciHyp pipeline, which employs a Hybrid-LLM-Crowd methodology described in the paper.

expert.ttl: Contrasting the crowd-sourced data, this file is composed of data curated through expert annotation. It reflects a more specialized and precise perspective, offering insights grounded in expert knowledge and analysis.

scihyp_VoiD.ttl: Serving as a metadata file containing a VoiD/DCAT description (Vocabulary of Interlinked Datasets/Data Catalog Vocabulary).

CrowdAlytics_7.4.owl: This file is the backbone of the dataset, outlining the underlying ontology that defines the structure and relationships within the SciHyp data.

You can find a detailed description of the data and other resources here.

SPARQL Endpoint

For querying the current SciHyp dataset, you can use our SPARQL endpoint. This endpoint allows you to execute SPARQL queries to explore and extract data from the SciHyp dataset interactively.

SPARQL Endpoint URL: https://crowdalytics.ifi.uzh.ch/sparql/dataset.html

Below is an example query to retrieve some annotations.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ca: <http://ddis.ifi.uzh.ch/ontologies/2021/crowdalytics#>
PREFIX disk: <http://disk-project.org/ontology/disk#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sqo: <https://w3id.org/sqo#>

SELECT ?hypothesis (GROUP_CONCAT(DISTINCT ?independentVariableText; separator=" ") AS ?independentVar)
(GROUP_CONCAT(DISTINCT ?dependentVariableText; separator=" ") AS ?dependentVar)
?leftGroup ?rightGroup ?relation_Operator
WHERE {
?TriggeredLineOFInquiry disk:hasLineOfInquiry ?LineOfInquiryURI .
?LineOfInquiryURI disk:hasHypothesisQuery ?hypothesis .
?LineOfInquiryURI disk:hasQuestion ?question .


OPTIONAL {
?question sqo:hasQuestionVariable ?questionVariable .
?questionVariable ca:typeOfVariable ?vartype .
?questionVariable ca:hasHypothesisVariableText ?vartext .
?LineOfInquiryURI ca:hasRelation ?relation_Operator .

BIND(IF(STR(?vartype) = "Independent Variable", ?vartext, "") AS ?independentVariableText)
BIND(IF(STR(?vartype) = "Dependent Variable", ?vartext, "") AS ?dependentVariableText)
}
OPTIONAL{
?question ca:hasGroupPair ?groupPair .
?groupPair ca:hasLeftGroup ?leftGroup .
?groupPair ca:hasRightGroup ?rightGroup .
?LineOfInquiryURI ca:hasOperator ?relation_Operator .

}
}
GROUP BY ?hypothesis ?leftGroup ?rightGroup ?relation_Operator

Note: You can find more example queries here.

Files

Files (1.3 MB)

Name	Size	Download all
crowd.ttl md5:2ea98266da133ab86508f59c615ab7fc	761.7 kB	Download
CrowdAlytics_7.4.owl md5:33f3dda2485b5853281203c7e515dd6e	33.3 kB	Download
expert.ttl md5:2be1a3c7db16749d76584dc768fb74d8	483.7 kB	Download
scihyp_VoiD.ttl md5:2064916a68b0b9e4462a9ce5e7001602	1.2 kB	Download

Additional details

Swiss National Science Foundation
CrowdAlytics: Large-Scale Human-Machine Systems for Data Science 184994

Repository URL: https://gitlab.ifi.uzh.ch/DDIS-Public/scihyp/
Development Status: Active

532

Views

139

Downloads

Show more details

	All versions	This version
Views	532	228
Downloads	139	28
Data volume	54.1 MB	9.7 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: April 10, 2024
Modified: June 7, 2024

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Files

Files (1.3 MB)

Additional details

Funding

Software

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Creators

Description

Files

Files (1.3 MB)

Additional details

Funding

Software