Published September 18, 2019 | Version v1
Presentation Open

Data Curation: The Forgotten Practice in the Era of AI

  • 1. Simulation Plus Inc.


Pankaj R. Daga from Simulation-Plus visited the Mobley group at UC Irvine on Sep 13, 2019 and gave a talk as a part of OFF seminar series about all the hazards that can appear in trying to automate mining of chemical and chemistry-related databases.

Abstract: Availability of large databases of chemical structures along with experimental data provides a great opportunity to build predictive and robust QSAR/QSPR models for application in various fields. The most common concern while using these databases is the quality of the chemical structures and associated biological data. It is very important to deal with correct chemical structure since incorrect structure will lead to the errors in calculation of molecular descriptors. Incorrect biological data will ultimately lead to meaningless results. This seminar will discuss experiences while curating these bioactivity databases with focus towards ADMET properties in drug discovery. Various sources of these errors and measures to find and correct these errors will be discussed.



Files (105.5 MB)

Name Size Download all
98.7 MB Preview Download
6.8 MB Preview Download

Additional details

Related works