Published October 6, 2024 | Version V3
Dataset Open

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Description

SciHyp is a dataset that supports researchers in understanding and identifying hypotheses in scientific literature, serving as a valuable resource across various scientific disciplines. SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines.

This repository contains the ontology and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles.

🚨 Latest Version Update: Please note that all information, data files, and SPARQL queries outlined here are based on the latest version of our paper, "SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles." This includes updates to the ontology and RDF data files available on GitLab. We encourage users to refer to this most recent version to ensure compatibility and relevance in their research and analysis.

crowd.ttl: The data in this file has been curated utilizing the SciHyp pipeline, which employs a Hybrid-LLM-Crowd methodology described in the paper.

expert.ttl: Contrasting the crowd-sourced data, this file is composed of data curated through expert annotation. It reflects a more specialized and precise perspective, offering insights grounded in expert knowledge and analysis.

scihyp_VoiD.ttl: Serving as a metadata file containing a VoiD/DCAT description (Vocabulary of Interlinked Datasets/Data Catalog Vocabulary).

CrowdAlytics_7.5.owl: This file is the backbone of the dataset, outlining the underlying ontology that defines the structure and relationships within the SciHyp data.

You can find a detailed description of the data and other resources here.

SPARQL Endpoint

For querying the current SciHyp dataset, you can use our SPARQL endpoint. This endpoint allows you to execute SPARQL queries to explore and extract data from the SciHyp dataset interactively.

SPARQL Endpoint URL: https://crowdalytics.ifi.uzh.ch/sparql/dataset.html


Below is an example query to retrieve some annotations. 

   
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX ca: <http://ddis.ifi.uzh.ch/ontologies/2021/crowdalytics#>
    PREFIX disk: <http://disk-project.org/ontology/disk#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX sqo: <https://w3id.org/sqo#>

SELECT DISTINCT ?hypothesis ?relation_Operator ?leftGroup ?rightGroup
 
WHERE {
 
        ?TriggeredLineOFInquiry disk:hasLineOfInquiry ?LineOfInquiryURI .
   ?LineOfInquiryURI disk:hasHypothesisQuery ?hypothesis .
   ?LineOfInquiryURI ca:hasGroup ?groupPair .
   ?groupPair ca:hasLeftGroup ?left .
   ?left ca:hasGroupName ?leftGroup .
   ?groupPair ca:hasRightGroup ?right .
   ?right ca:hasGroupName ?rightGroup .
   ?LineOfInquiryURI ca:hasOperator ?Operator .
   ?Operator rdfs:label ?relation_Operator .

   FILTER(CONTAINS(LCASE(?leftGroup), "game") || CONTAINS(LCASE(?rightGroup), "treatment"))
   FILTER(CONTAINS(LCASE(?relation_Operator), "similar") || CONTAINS(LCASE(?relation_Operator), "same"))
}
GROUP BY ?hypothesis ?relation_Operator ?leftGroup ?rightGroup

 

Note: You can find more example queries here.
 

Files

Files (1.6 MB)

Name Size Download all
md5:1530fa5400c9157bcdcd880dbd8ba670
924.8 kB Download
md5:b16468409caaccc46fdeb697ce28b77c
56.3 kB Download
md5:5b52ba98f251248e10e5844b0effe143
635.3 kB Download
md5:2064916a68b0b9e4462a9ce5e7001602
1.2 kB Download

Additional details

Funding

Swiss National Science Foundation
CrowdAlytics: Large-Scale Human-Machine Systems for Data Science 184994

Software