Published October 6, 2024 | Version V3

Dataset Open

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

1. University of Zurich

SciHyp is a dataset that supports researchers in understanding and identifying hypotheses in scientific literature, serving as a valuable resource across various scientific disciplines. SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines.

This repository contains the ontology and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles.

🚨 Latest Version Update: Please note that all information, data files, and SPARQL queries outlined here are based on the latest version of our paper, "SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles." This includes updates to the ontology and RDF data files available on GitLab. We encourage users to refer to this most recent version to ensure compatibility and relevance in their research and analysis.

crowd.ttl: The data in this file has been curated utilizing the SciHyp pipeline, which employs a Hybrid-LLM-Crowd methodology described in the paper.

expert.ttl: Contrasting the crowd-sourced data, this file is composed of data curated through expert annotation. It reflects a more specialized and precise perspective, offering insights grounded in expert knowledge and analysis.

scihyp_VoiD.ttl: Serving as a metadata file containing a VoiD/DCAT description (Vocabulary of Interlinked Datasets/Data Catalog Vocabulary).

CrowdAlytics_7.5.owl: This file is the backbone of the dataset, outlining the underlying ontology that defines the structure and relationships within the SciHyp data.

You can find a detailed description of the data and other resources here.

SPARQL Endpoint

For querying the current SciHyp dataset, you can use our SPARQL endpoint. This endpoint allows you to execute SPARQL queries to explore and extract data from the SciHyp dataset interactively.

SPARQL Endpoint URL: https://crowdalytics.ifi.uzh.ch/sparql/dataset.html

Below is an example query to retrieve some annotations.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ca: <http://ddis.ifi.uzh.ch/ontologies/2021/crowdalytics#>
PREFIX disk: <http://disk-project.org/ontology/disk#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sqo: <https://w3id.org/sqo#>

SELECT DISTINCT ?hypothesis ?relation_Operator ?leftGroup ?rightGroup

WHERE {

?TriggeredLineOFInquiry disk:hasLineOfInquiry ?LineOfInquiryURI .

?LineOfInquiryURI disk:hasHypothesisQuery ?hypothesis .

?LineOfInquiryURI ca:hasGroup ?groupPair .

?groupPair ca:hasLeftGroup ?left .

?left ca:hasGroupName ?leftGroup .

?groupPair ca:hasRightGroup ?right .

?right ca:hasGroupName ?rightGroup .

?LineOfInquiryURI ca:hasOperator ?Operator .

?Operator rdfs:label ?relation_Operator .

FILTER(CONTAINS(LCASE(?leftGroup), "game") || CONTAINS(LCASE(?rightGroup), "treatment"))

FILTER(CONTAINS(LCASE(?relation_Operator), "similar") || CONTAINS(LCASE(?relation_Operator), "same"))

}

GROUP BY ?hypothesis ?relation_Operator ?leftGroup ?rightGroup

Note: You can find more example queries here.

Files

Files (1.6 MB)

Name	Size	Download all
crowd.ttl md5:1530fa5400c9157bcdcd880dbd8ba670	924.8 kB	Download
CrowdAlytics_7.5.owl md5:b16468409caaccc46fdeb697ce28b77c	56.3 kB	Download
expert.ttl md5:5b52ba98f251248e10e5844b0effe143	635.3 kB	Download
scihyp_VoiD.ttl md5:2064916a68b0b9e4462a9ce5e7001602	1.2 kB	Download

Additional details

Swiss National Science Foundation
CrowdAlytics: Large-Scale Human-Machine Systems for Data Science 184994

Repository URL: https://gitlab.ifi.uzh.ch/DDIS-Public/scihyp/
Development Status: Active

Citations

Oops! Something went wrong while fetching results.

572

Views

152

Downloads

Show more details

	All versions	This version
Views	572	122
Downloads	152	56
Data volume	58.6 MB	25.4 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

THE 23RD INTERNATIONAL SEMANTIC WEB CONFERENCE (ISWC) , Baltimore, 11-15 November

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 6, 2024
Modified: October 6, 2024

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Files

Files (1.6 MB)

Additional details

Funding

Software

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Creators

Description

Files

Files (1.6 MB)

Additional details

Funding

Software