OntoChem PFAS CORE and Patent Files for MetFrag
Creators
- 1. OntoChem
- 2. Collabra
- 3. Google
- 4. LCSB, Uni Luxembourg
Description
These are MetFraggable versions of the PFAS lists produced by OntoChem by performing literature mining of the CORE database (27K entries) and Google Patent collections (1.7M entries). The MetFrag versions have undergone filtering to remove entries that prevent MetFrag running (i.e. multiple-entry formulas and certain elements). Each file contains a tag whether the PFAS fits three definitions, A, B and C. The CORE database also contains the number of references in which the PFAS entry was found. Each file is available as CSV or in compressed form.
- PFAS Definition A: Each compound that contains a CF2 group
- PFAS Definition B: Each compound that contains a (AH)(AH)(F)C-C(AH)F2 group, where AH groups could be hydrogen or any other atom and the bond between both aliphatic carbon atoms is a single bond
- PFAS Definition C: Each compound that contains a (R1)(R2)(F)C-C(R3)F2 group is considered a PFAS, where the R groups are any atom except hydrogen and the bond between both aliphatic carbon atoms is a single bond
Please note these files are very large (especially patents) and should not be uploaded to MetFragWeb directly - they will be available from the dropdown menu. These are provided for command line users, and any other workflows interested in these files! The patent file is 1.7 million entries and can cause some delay in the command line, compared with smaller database files.
Full details are available in this preprint by Barnabas et al (2022) DOI: 10.26434/chemrxiv-2022-nmnnd-v2
Update 20/04/2022: uploaded files with updated CID mappings post-PubChem deposition.
Notes
Files
OntoChem_PFAS_CORE_20220420.csv
Additional details
Related works
- Is supplement to
- https://ontochem.com/ (URL)
- https://msbi.ipb-halle.de/MetFrag/ (URL)
- Preprint: 10.26434/chemrxiv-2022-nmnnd-v2 (DOI)