S1000 corpus, large-scale tagging results and other supplementary files
Creators
- 1. Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark
- 2. TurkuNLP Group, Department of Computing, University of Turku, Finland
- 3. Textmi, Tokyo, Japan
- 4. Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Greece
Description
Data associated with the S1000 corpus
The tagger software for which the dictionary files in tagger-organisms-dictionary-S1000.tar.gz can be used with can be found here: https://github.com/larsjuhljensen/tagger
The online version of the annotation documentation can be found here: https://katnastou.github.io/s1000-corpus-annotation-guidelines/
The S1000 corpus split in training, development and test sets in BRAT format can be found in S1000-corpus.tar.gz and in CoNLL format here: s1000-conll.tar.gz
The tagging results of Jensenlab tagger for the S1000 test set are here: S1000-jensenlab-tagger.tar.gz
The result from the large scale run in entire PubMed and PMC Open Access articles for Jensenlab tagger is provided here: Jensenlab_tagger_large_scale_matches_with_rank.tsv.gz
The model used for the large scale run of the transformer-based method is here: S1000_Transformer_based_tagger_large_scale_model.tar.gz and the results from the large scale tagging here: Transformer_based_tagger_large_scale_matches_with_rank.tsv.zip
Files
Annotation guidelines for S1000 corpus.pdf
Files
(19.3 GB)
Name | Size | Download all |
---|---|---|
md5:6992a62d91aebeca9100ee6bd46be3da
|
158.0 kB | Preview Download |
md5:8c1135fe8cb216f41f54061c43abc171
|
16.1 GB | Download |
md5:961aa1d006aefa8681c38c192983ccf9
|
585.3 kB | Download |
md5:06088fa7a73fc84f9df9662f6394714e
|
272.0 kB | Download |
md5:7e812ffe2967f04526556e194af68618
|
275.1 kB | Download |
md5:b8cff89db39ea908ed7051d17a931e96
|
1.3 GB | Download |
md5:4e14d7396d7b68f531ea797112d620da
|
208.2 MB | Download |
md5:a495adf984d2f9ce193113c0719bc87d
|
1.7 GB | Preview Download |
Additional details
Related works
- Is required by
- Preprint: 10.1101/2023.02.20.528934 (DOI)