There is a newer version of the record available.

Published April 10, 2024 | Version V2
Dataset Open

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Description

SciHyp is a dataset that supports researchers in understanding and identifying hypotheses in scientific literature, serving as a valuable resource across various scientific disciplines. SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines.

This repository contains the ontology and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles.

🚨 Important Update: The ontology and RDF data files have been updated and are now available on GitLab. The latest version will be uploaded here soon.

crowd.ttl: The data in this file has been curated utilizing the SciHyp pipeline, which employs a Hybrid-LLM-Crowd methodology described in the paper.

expert.ttl: Contrasting the crowd-sourced data, this file is composed of data curated through expert annotation. It reflects a more specialized and precise perspective, offering insights grounded in expert knowledge and analysis.

scihyp_VoiD.ttl: Serving as a metadata file containing a VoiD/DCAT description (Vocabulary of Interlinked Datasets/Data Catalog Vocabulary).

CrowdAlytics_7.4.owl: This file is the backbone of the dataset, outlining the underlying ontology that defines the structure and relationships within the SciHyp data.

You can find a detailed description of the data and other resources here.

SPARQL Endpoint

For querying the current SciHyp dataset, you can use our SPARQL endpoint. This endpoint allows you to execute SPARQL queries to explore and extract data from the SciHyp dataset interactively.

SPARQL Endpoint URL: https://crowdalytics.ifi.uzh.ch/sparql/dataset.html


Below is an example query to retrieve some annotations. 

   
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX ca: <http://ddis.ifi.uzh.ch/ontologies/2021/crowdalytics#>
    PREFIX disk: <http://disk-project.org/ontology/disk#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX sqo: <https://w3id.org/sqo#>
    
    SELECT ?hypothesis (GROUP_CONCAT(DISTINCT ?independentVariableText; separator=" ") AS ?independentVar)
                       (GROUP_CONCAT(DISTINCT ?dependentVariableText; separator=" ") AS ?dependentVar)
                        ?leftGroup ?rightGroup ?relation_Operator
    WHERE {
      ?TriggeredLineOFInquiry disk:hasLineOfInquiry ?LineOfInquiryURI . 
      ?LineOfInquiryURI disk:hasHypothesisQuery ?hypothesis .
      ?LineOfInquiryURI disk:hasQuestion ?question .
      
      
       OPTIONAL {
        ?question sqo:hasQuestionVariable ?questionVariable .
        ?questionVariable ca:typeOfVariable ?vartype .
        ?questionVariable ca:hasHypothesisVariableText ?vartext .
        ?LineOfInquiryURI ca:hasRelation ?relation_Operator .
        
        BIND(IF(STR(?vartype) = "Independent Variable", ?vartext, "") AS ?independentVariableText)
        BIND(IF(STR(?vartype) = "Dependent Variable", ?vartext, "") AS ?dependentVariableText)
      }
      OPTIONAL{
        ?question ca:hasGroupPair ?groupPair .
        ?groupPair ca:hasLeftGroup ?leftGroup .
        ?groupPair ca:hasRightGroup ?rightGroup .
        ?LineOfInquiryURI ca:hasOperator ?relation_Operator .
    
      }
    }
    GROUP BY ?hypothesis ?leftGroup ?rightGroup ?relation_Operator

 

Note: You can find more example queries here.
 

Files

Files (1.3 MB)

Name Size Download all
md5:2ea98266da133ab86508f59c615ab7fc
761.7 kB Download
md5:33f3dda2485b5853281203c7e515dd6e
33.3 kB Download
md5:2be1a3c7db16749d76584dc768fb74d8
483.7 kB Download
md5:2064916a68b0b9e4462a9ce5e7001602
1.2 kB Download

Additional details

Funding

Swiss National Science Foundation
CrowdAlytics: Large-Scale Human-Machine Systems for Data Science 184994

Software