Published May 9, 2024 | Version v1
Presentation Open

So FAIR, So Clean: How the cleanventory Approach Provides Reliable Data for Chemical Structures Regulated in Global Trade Markets

  • 1. ROR icon Norwegian Geotechnical Institute
  • 2. ROR icon University of Luxembourg
  • 3. ROR icon Swiss Federal Laboratories for Materials Science and Technology
  • 4. ROR icon DVGW - Technologiezentrum Wasser

Description

With the number of regulated chemicals ever rising, there is need for a harmonized information system on regulated chemicals, especially under the newly established concept of "one substance – one assessment". As part of the H2020 project ZeroPM (https://zeropm.eu), a fully reproducible and open-source global chemical inventory – the "cleanventory" – is being developed. A modern database infrastructure will facilitate wide-spread use of the database, strictly following FAIR principles (Findable, Accessible, Interoperable, Reproducible). The database will be publicly available and include features for programmatic access. The public, legislators, and industry stakeholders will also have the possibility to review all the code and build their own "cleanventory" from scratch.

 

So far, over 990,000 inventory entries with over 225,000 unique CAS Registry Numbers and over 410,000 unique chemical names have been integrated. This amount of information could be considered "big data", but we put considerable efforts towards the quality of the data, i.e., also ensuring "good data". To identify chemical structures from inventory entries, CAS Registry Numbers and chemical names are used. To convert inventory identifiers to InChI strings (i.e., structural information), four freely available API services are used: PubChem (compound and substance domain), CAS Common Chemistry, CCTE CompTox, and NCI/CADD Chemical Identifier Resolver. To identify the "most probable" chemical structure for every inventory entry (i.e., the combination of CAS Registry Number and chemical name), a weighted consensus ranking approach was developed to assign each InChI strings an identification score.

Over 344,000 unique InChI strings were retrieved by the API services. After a weighted consensus ranking approach, over 126,000 unique InChI strings are identified as being the "most probable" chemical structure for the given inventory entries. This high-quality database of chemical structures on global trade markets will support the EU Chemical Strategy for Sustainability initiative by effectively enabling the concept of "one substance – one assessment" by providing robust, curated, and transparent data and workflows.

Files

wolf_setac24_v5.pdf

Files (544.6 kB)

Name Size Download all
md5:f9d53b706b957613fe1d94f6f38a2179
544.6 kB Preview Download

Additional details

Funding

ZeroPM – ZeroPM: Zero pollution of Persistent, Mobile substances 101036756
European Commission