Published June 5, 2019 | Version v5
Dataset Open

Industry-scale Application and Evaluation of Deep Learning for Drug Target Prediction

  • 1. Clinical Pharmacology and Safety Science, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden
  • 2. LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Altenbergerstr 69, 4040 Linz, Austria
  • 3. Computational Biology, Discovery Sciences, Janssen Pharmaceutical, Turnhoutseweg 30, 2349, Beerse, Belgium
  • 4. Computational Biology, Discovery Sciences, Janssen Pharmaceutica NV, 1400 McKean Rd, 19002, Spring House, Pennsylvania US
  • 5. Computational Biology, Discovery Sciences, Janssen Cilag SA, Calle Río Jarama, 71A, 45007, Toledo, Spain
  • 6. Ideaconsult Ltd., 4. Angel Kanchev Str., 1000 Sofia, Bulgaria
  • 7. Intel Corporation, Data Center Group, Veldkant 31, 2550 Kontich, Belgium.
  • 8. IT4Innovations, VSB – Technical University of Ostrava, 17. Listopadu 15/2172, 70800, Ostrava-Poruba, Czech Republic
  • 9. Hit Discovery, Discovery Sciences, R&D BioPharmaceuticals, AstraZeneca, Pepparedsleden 1, 43183, Mölndal, Sweden

Description

Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.

Notes

Dataset Format to reproduce manuscript results

Files

Files (1.2 GB)

Name Size Download all
md5:8dc6e1b81c1ef73c577d7434d9cc0132
190.9 MB Download
md5:19410b01e9c5da2fbb77cd6449f2f5d5
189.9 MB Download
md5:4e428221f720322083c24048b6fa4c34
11.9 MB Download
md5:e26b290fbbd31d4ab27485ced96855e5
405.7 MB Download
md5:b1f33641eda1a79aed6824c1fed0f6d3
166.5 MB Download
md5:0848288b6f097c7ffd87dc54cfb2bad4
92.7 MB Download
md5:a82bfca83dbf239b77bda17a9c638e7f
81.0 MB Download
md5:69264fecb0318e3643f0330c93cbf5d5
78.1 MB Download
md5:4d9f73df78012ac6a61d7219d35d5a4b
306.3 kB Download
md5:4b309ea5ba664a1c725cdbe8c91b10a9
361.1 kB Download
md5:da5357d64be5e4f78d407c8b5ad417f3
902 Bytes Download
md5:1ee902d003e3856282172971043407f3
16.5 MB Download

Additional details

Related works

Is supplement to
Journal article: 10.1186/s13321-020-00428-5 (DOI)

Funding

ExCAPE – Exascale Compound Activity Prediction Engine 671555
European Commission