Published August 6, 2025 | Version v1
Dataset Open

IUPAC names text-mined from patents by SureChEMBL

  • 1. EDMO icon European Molecular Biology Laboratory - European Bioinformatics Institute
  • 1. EDMO icon European Molecular Biology Laboratory - European Bioinformatics Institute
  • 2. ROR icon European Bioinformatics Institute

Description

What: This file contains IUPAC names text-mined from patents (US, WIPO, EPO, Chinese, Japanese). 

Who: This file is provided by the SureChEMBL project (https://www.surechembl.org/) under a CC0 license. We are part of the Chemical Biology Services team (https://www.ebi.ac.uk/about/teams/chemical-biology-services/) at EMBL-EBI. Please cite us appropriately if you use this dataset (thanks!).

Format: 

This is a gzipped TSV file with two columns, IUPAC Names and SMILES. The IUPAC Names column may itself contain multiple IUPAC names separated by an exclamation mark (!). Each of these names resolves to the same SMILES and they differ only in casing. They are sorted such that the name with fewer uppercase characters comes first.

Details:

As part of the SureChEMBL text-mining pipeline, we recognise and extract IUPAC names in patents. These are stored in a database at SureChEMBL HQ, and converted to chemical structures which are made available in our downloads. Here we are making available those text-mined IUPAC names, or to be exact, the names after minor corrections (that may involve removing spaces, or fixing parentheses) that enable the name to be interpreted.

How:

We use LeadMine (https://nextmovesoftware.com/leadmine) from NextMove Software to textmine systematic IUPAC names. This incorporates OPSIN (https://www.ebi.ac.uk/opsin/) by Daniel Lowe to resolve IUPAC names to SMILES.

Why:

These names are being made available to support Egon Willighagen's 'One Million IUPAC names' project (https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html).

Files

Files (132.0 MB)

Name Size Download all
md5:994a526cd60a32f0d76a01fd32c4dae2
132.0 MB Download

Additional details

Funding

Wellcome Trust