Published May 23, 2024 | Version v2
Publication Open

CoNECo: A corpus for named entity recognition and normalization of protein complexes

  • 1. Københavns Universitet
  • 1. ROR icon University of Copenhagen
  • 2. ROR icon University of Turku

Description

Data associated with the CoNECo corpus

The tagger software for which the dictionary files in tagger_dictionary_complex.tar.gz can be used with, can be found here: https://github.com/larsjuhljensen/tagger

The online version of the annotation documentation can be found here: https://katnastou.github.io/annodoc-CoNECo/ and a copy is here: Annodoc_CoNECo.pdf

The CoNECo corpus split in training, development and test sets in BRAT format can be found in CoNECo_corpus.tar.gz and in conll format: CoNECo_corpus_conll.tar.gz

The tagging results of Jensenlab tagger for the CoNECo test set are here: CoNECo_test_Jensenlab_tagger.tar.gz

The tagging results of Transformer-based tagger for the CoNECo test set are here: CoNECo_test_Transformer_tagger.tar.gz

The result from the large-scale run in entire PubMed (as of January 2024) and PMC Open Access (as of November 2023) articles for Jensenlab tagger is provided here: Jensenlab_tagger_large_scale_matches.tar.gz

The model used for the large scale run of the transformer-based method is here: CoNECo_Transformer_based_tagger_large_scale_model.tar.gz and the results from the large-scale tagging of the literature here: Transformer_based_tagger_large_scale_matches.tsv.gz

The counts and names of the common matches between the two large scale runs can be found here, while the counts for jensenlab tagger large scale unique matches and transformer-based tagger unique matches can be found here and here, respectively. 

Files

Annodoc_CoNECo.pdf

Files (2.5 GB)

Name Size Download all
md5:9efbaf1f88212d51105952ce12fbdc3c
68.3 kB Preview Download
md5:788916c5cf6f0b6e8076e6d034cc7b3d
941.3 kB Download
md5:d6dac15d7ca6836d8dc34b91d264401d
812.3 kB Download
md5:48f484ce7f0d0c26c0cba300aa924d43
173.0 kB Download
md5:8493ea8822256acddc50bbabedd27e6e
167.5 kB Download
md5:dfacf1d4b7a3347db11dc6bca3f6d518
1.3 GB Download
md5:0acc1a23bca2346dbbf0f2ed4af02a51
263.1 kB Download
md5:4af9932159c4f2d26a422df3a0558673
978.7 kB Download
md5:91053cbb150ada0cfae83e1d2a91ef7b
2.4 MB Download
md5:f7f212c2cc52b831a200d7c292fd8e96
1.1 GB Download
md5:ba1319ad4457dad72fb0d45a5513d577
2.3 MB Download
md5:296496b1056afba3656111c07f19074d
160.9 MB Download

Additional details

Funding

European Commission
DeepTextNet – Deep learning-based text mining for interpretation of omics data 101023676