cdk/cdk: CDK 2.10
Authors/Creators
- John Mayfield1
- Egon Willighagen2
- Rajarshi Guha
- gilleain torrance
- Uli3
- Kazuya Ujihara
- Syed Asad Rahman4
- Jonathan Alvarsson
- Mark J. Williamson5
- Jonas Schaub6
- Saulius Gražulis
- Danny Katzel7
- Tomáš Pluskal8
- Xavier Linn
- Yap Chun Wei9
- Daniel Szisz
- Nikolay Kochev10
- Nina Jeliazkova11
- Eric Bach12
- Arvid Berg
- Alex Clark13
- fbaensch-beilstein14
- Ralf Stephan
- Jeffrey Plante15
- Klas Jönsson
- Krishna Dole
- Oliver Stueker16
- Valentyn Kolesnikov17
- Nicolas Alfonso De Pineda Gutierrez
- kaibioinfo18
- 1. NextMove Software ltd
- 2. @BiGCAT-UM
- 3. @pendingai
- 4. EMBL-EBI
- 5. @vernalis
- 6. Friedrich Schiller University Jena
- 7. @ncats
- 8. Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences
- 9. National University of Singapore
- 10. University of Plovdiv
- 11. Ideaconsult Ltd. @ideaconsult
- 12. Elisa Oyj
- 13. Molecular Materials Informatics, Inc.
- 14. @Beilstein-Institut
- 15. Lhasa Limited
- 16. ACENET / Memorial University / Digital Research Alliance of Canada
- 17. IT
- 18. Friedrich-Schiller University, Jena
Description
New Features/Key Changes
AtomContainer new implementation (IMPORTANT)
The new AtomContainer implementation is now the default after a gradual introduction. You can still use the old implementation but you must explicitly create an AtomContainerLegacy. This should be a seamless change for most but please notify if you have an unexpected error.
SMIRKS
SMIRKS support with the ability to approximate other implementations (inc. Daylight and RDKit Reaction Smarts). It includes convenience APIs for applying a transform to all places at once (i.e. dt_xapply) and efficient support for hydrogen handling (explicit hydrogen are not required on the input). Overall the speed it good and a transform can be run over the all of ChEMBL 35 in only ~30 seconds (see Appendix A1).
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(bldr);
SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Default);
String smminp = "c1cc(N(=O)=O)ccc1N(=O)=O";
IAtomContainer mol = smipar.parseSmiles(smminp);
Smirks.compile("[N:1](=[OD1+0])=[OD1+0]>>[N+:1](=O)[O-] polar-nitro")
.apply(mol); // exclusive apply mode
String smiout = smigen.create(mol); // C1=CC([N+](=O)[O-])=CC=C1[N+](=O)[O-]
More information can be found in the JavaDoc and functionality will be added to the CDK Depict Web Application.
Reaction InChI (RInChI) generation
A pure Java implementation of Reaction InChI has been added allowing generation of RInChI strings and keys:
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(bldr);
IReaction reaction = smipar.parseReactionSmiles("CCO.[CH3:1][C:2](=[O:3])[OH:4]>[H+]>CC[O:4][C:2](=[O:3])[CH3:1].O Ethyl esterification [1.7.3]\n");
RInChIGenerator rinchigen = new RInChIGenerator();
rinchigen.generate(reaction);
System.err.println(rinchigen.getRInChI()); // RInChI=1.00.1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)!C2H6O/c1-2-3/h3H,2H2,1H3<>C4H8O2/c1-3-6-4(2)5/h3H2,1-2H3!H2O/h1H2<>p+1/d+
System.err.println(rinchigen.getShortRInChIKey()); // Short-RInChIKey=SA-FUHFF-JJFIATRHOH-UDXZTNISGZ-GPRLSGONYQ-NUHFF-NUHFF-NUHFF-ZZZ
RDfile reading support.
RDfiles belong to the CT file family formats and allows records with associated data. The format is commonly used for reaction data from ELN or otherwise.
RdfileReader rdReader = new RdfileReader(new FileReader("/tmp/pistachio-rxns-2501091627.rd"),
SilentChemObjectBuilder.getInstance(),
true);
while (rdReader.hasNext()) {
RdfileRecord record = rdReader.next();
if (record.isRxnFile()) {
IReaction reaction = record.getReaction();
} else {
IAtomContainer container = record.getAtomContainer();
}
}
Faster ring and aromaticity perception
Faster ring membership and aromaticity assignment. The move to AtomContainer2 (see above) allows additional
optimizations to these algorithms. The APIs will run faster however for aromaticity you must use Cycles.all() on it's own. There is also a new static method for convenience and improved aromatic model encoding.
// new way, no checked exception
Cycles.markRingAtomsAndBonds(mol); // prerequisite
if (!Aromaticity.apply(Aromaticity.Model.Daylight, molecule)) {
// return false = too many cycles to check
}
// old way (will still be faster)
Aromaticity aromaticity = new Aromaticity(ElectronDonation.daylight(),
Cycles.all());
IAtomContainer container = ...;
try {
if (aromaticity.apply(container)) {
//
}
} catch (CDKException e) {
// cycle computation was intractable
}
Improved inorganic stereochemistry
It is now possible to represent degenerate inorganic stereochemistry where 1 or more neighbours are missing/implicit. For example we can describe a square pyramidal structure as an octahedral without a missing ligand. Support for implicit/explicit hydrogens around theses atoms has also been improved.
[NH3][Co@OH25](Cl)(Cl)(Cl)(Cl) sqpyr
[NH3][Co@OH4](Cl)(Cl)[NH3] seesaw
You can also use this in SMARTS to match across atoms and equatorial using the following patterns:
Cl[Co@OH1]Cl across
(Cl)(Cl)(Cl)%20sqpyr&w=-1&h=-1&abbr=on&hdisp=S&sma=Cl%5BCo%40OH1%5DCl&zoom=1.3&annotate=none&r=0) (Cl)%5BNH3%5D%20seesaw&w=-1&h=-1&abbr=on&hdisp=S&sma=Cl%5BCo%40OH1%5DCl&zoom=1.3&annotate=none&r=0)
Cl[Co@OH3]Cl equatorial
(Cl)(Cl)(Cl)%20sqpyr&w=-1&h=-1&abbr=on&hdisp=S&sma=Cl%5BCo%40OH3%5DCl&zoom=1.3&annotate=none&r=0) (Cl)%5BNH3%5D%20seesaw&w=-1&h=-1&abbr=on&hdisp=S&sma=Cl%5BCo%40OH3%5DCl&zoom=1.3&annotate=none&r=0)
Functional Group Finder
A functional group finder has been added based on Peter Ertl's algorithm.
Peter Ertl. 2017 Fritsch et al. 2019
The API allows you generate the functional groups as fragments or my favorite which is fill an array with identifier numbers - this is then very easy to depict.
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(bldr);
String smiles = "C2C(NC)=NC3=C(C(C1=CC=CC=C1)=N2=O)C=C(Cl)C=C3";
IAtomContainer mol = smipar.parseSmiles(smiles);
FunctionalGroupsFinder fgFinder = FunctionalGroupsFinder.withNoEnvironment();
Cycles.markRingAtomsAndBonds(mol);
Aromaticity.apply(Aromaticity.Model.Daylight, mol);
// extract the groups as new fragments
List<IAtomContainer> functionalGroupsList = fgFinder.extract(inputMol);
// fill an array with numbers that indicate which functional group something belongs to
int[] fgrps = new int[mol.getAtomCount()];
fgFinder.find(fgrps, mol);
// Set the group as the atom map/class in SMILES
for (IAtom atom : mol.atoms())
atom.setMapIdx(1+fgrps[atom.getIndex()]);
System.out.println(new SmilesGenerator(SmiFlavor.AtomAtomMap).create(mol));
%3D%5BN%3A1%5DC2%3DC(%5BC%3A2%5D(C3%3DCC%3DCC%3DC3)%3D%5BN%3A2%5D1%3D%5BO%3A2%5D)C%3DC(%5BCl%3A3%5D)C%3DC2&w=-1&h=-1&abbr=off&hdisp=S&zoom=1.3&annotate=colmap&r=0)
Sugar Moiety Removal
The Sugar Removal Utility (SRU) implements a generalized algorithm for automated detection of circular and linear sugars in molecular structures and their removal.
Convenience APIs
- Iterate over molecules of a reaction and sets
- Creating atoms/bonds in the context of molecules with:
mol.newAtom()andmol.newBond()and others. - Better IO error handling
Contributors
55 Egon Willighagen
8 Felix Bänsch
3 Jean Marois
245 John Mayfield
43 Jonas Schaub
2 Matthias Mailänder
5 Tyler Peryea
123 Uli Fechner
3 Valentyn Kolesnikov
3 Stefan Kuhn
Overview of Pull Requests
- SonarCloud is not reporting test coverage correctly because it was no… by @johnmay in https://github.com/cdk/cdk/pull/1000
- Improved the abbreviation handling over atom sets, this is useful for… by @johnmay in https://github.com/cdk/cdk/pull/996
- Fix - avoid placing a wedge on the right-angled bond when a centre is… by @johnmay in https://github.com/cdk/cdk/pull/998
- Quality of life API interfaces. The IAtomContainerSet and IReaction c… by @johnmay in https://github.com/cdk/cdk/pull/997
- Sonar settings for aggregated test coverage. by @johnmay in https://github.com/cdk/cdk/pull/1001
- CMLXOM 4.6 by @egonw in https://github.com/cdk/cdk/pull/1004
- Redo @parit's changes for net/undirected reaction depiction on the ne… by @johnmay in https://github.com/cdk/cdk/pull/1009
- Smiles 0 isotope by @johnmay in https://github.com/cdk/cdk/pull/1007
- When atoms/bonds are aware of the container they are in - it is usefu… by @johnmay in https://github.com/cdk/cdk/pull/1010
- Fix the CDK C.plus atom type, there was already comment in the test t… by @johnmay in https://github.com/cdk/cdk/pull/1011
- Query bond funcs by @johnmay in https://github.com/cdk/cdk/pull/938
- Read the atom-atom mapping info from a V3000 file. by @johnmay in https://github.com/cdk/cdk/pull/1012
- Fixes https://github.com/cdk/depict/issues/76. We do not like -C=CO a… by @johnmay in https://github.com/cdk/cdk/pull/1015
- Added an API for fatal IO errors by @egonw in https://github.com/cdk/cdk/pull/1019
- Java21 by @johnmay in https://github.com/cdk/cdk/pull/1014
- Fixes #1024 - we should perhaps rework the CDK radical representation… by @johnmay in https://github.com/cdk/cdk/pull/1025
- Updated dependencies by @egonw in https://github.com/cdk/cdk/pull/1026
- Fix for non-deterministic CIP designation bug by @tylerperyea in https://github.com/cdk/cdk/pull/1027
- Fix and issue with contraction on terminal attachment points. by @johnmay in https://github.com/cdk/cdk/pull/1028
- Fix a minor issue with an abbreviation like -NnButBu. Currently this … by @johnmay in https://github.com/cdk/cdk/pull/1030
- Make sure Sgroups attached to reactions get passed through and emitte… by @johnmay in https://github.com/cdk/cdk/pull/1031
- First pass at aligned depictions API. by @johnmay in https://github.com/cdk/cdk/pull/1032
- Symmetry calculation may fail. by @johnmay in https://github.com/cdk/cdk/pull/1033
- Code cleanup by @egonw in https://github.com/cdk/cdk/pull/1018
- Fix a minor issue from sonarcloud, we check the counts elsewhere so t… by @johnmay in https://github.com/cdk/cdk/pull/1034
- Additional tokens reagent label formatting. by @johnmay in https://github.com/cdk/cdk/pull/1036
- Depict align tweaks by @johnmay in https://github.com/cdk/cdk/pull/1037
- Added a missing test class for Elements by @egonw in https://github.com/cdk/cdk/pull/1042
- Added isMetalloid utility method to Elements class by @JonasSchaub in https://github.com/cdk/cdk/pull/1041
- Refine OSGi import rules by @Mailaender in https://github.com/cdk/cdk/pull/1043
- AtomContainer2 Phase 2 by @johnmay in https://github.com/cdk/cdk/pull/1047
- New convenience methods on the Atom API. by @johnmay in https://github.com/cdk/cdk/pull/1046
- AtomContainer2 phase 3 by @johnmay in https://github.com/cdk/cdk/pull/1048
- Binconnected - faster ring atom/bond marking by @johnmay in https://github.com/cdk/cdk/pull/1051
- Add transform/SMIRKS support to CDK. by @johnmay in https://github.com/cdk/cdk/pull/916
- The number of essential/relevant cycles can be exponential for some m… by @johnmay in https://github.com/cdk/cdk/pull/1052
- Relavent cycles limit test by @johnmay in https://github.com/cdk/cdk/pull/1053
- Updated JNA-InChI (JNA compatibility) by @egonw in https://github.com/cdk/cdk/pull/1054
- Create CITATION.cff by @egonw in https://github.com/cdk/cdk/pull/1055
- Fix a corner case when depicting cc(C)c by @johnmay in https://github.com/cdk/cdk/pull/1059
- small doc fix for cdkAllowingExocyclic() by @JonasSchaub in https://github.com/cdk/cdk/pull/1060
- Increase the max fragment count when generating abbreviations. by @johnmay in https://github.com/cdk/cdk/pull/1066
- Ensure the ESSSR parameter is reported in the FP version info. by @johnmay in https://github.com/cdk/cdk/pull/1065
- Updated ${version} in pom.xml by @javadev in https://github.com/cdk/cdk/pull/1070
- CMLXOM 4.9 and log4j 2.23.1 by @egonw in https://github.com/cdk/cdk/pull/1072
- Link to the ChemPyFormatics 'book' by @egonw in https://github.com/cdk/cdk/pull/1071
- Maven build system updates by @egonw in https://github.com/cdk/cdk/pull/1074
- Only run JaCoCo once by @egonw in https://github.com/cdk/cdk/pull/1076
- Depiction issues by @johnmay in https://github.com/cdk/cdk/pull/1080
- Moved a number of test classes to the same module as the tested classes by @egonw in https://github.com/cdk/cdk/pull/1081
- Only copy mapped bonds when deciding how to align the structure. by @johnmay in https://github.com/cdk/cdk/pull/1082
- Fix a bug in the MDLV2000Reader where the wrong "molecule" is used. by @johnmay in https://github.com/cdk/cdk/pull/1085
- Removed a module that has been empty for a few years by @egonw in https://github.com/cdk/cdk/pull/1087
- The path based fingerprint should be identical with/without explicit … by @johnmay in https://github.com/cdk/cdk/pull/1089
- Stabilise the CDK atom type based aromaticity model. This causes a sm… by @johnmay in https://github.com/cdk/cdk/pull/1091
- Use interfaces instead of instances and use silent by @egonw in https://github.com/cdk/cdk/pull/1094
- Recovers the "simple" patches from testing2 by @egonw in https://github.com/cdk/cdk/pull/1093
- Improving the testing coverage by @egonw in https://github.com/cdk/cdk/pull/1095
- Overhaul and optimise the aromaticity procedures in CDK. by @johnmay in https://github.com/cdk/cdk/pull/1092
- Invalid stereochemistry group causes infinite loop by @marois in https://github.com/cdk/cdk/pull/1098
- add ability to read MDL RXN V3000 files with zero reactants but a REACTANT block by @uli-f in https://github.com/cdk/cdk/pull/1100
- Fix CCD/WebMolKit Sgroups that are missing the SBL. by @johnmay in https://github.com/cdk/cdk/pull/1099
- Fixes code examples by @egonw in https://github.com/cdk/cdk/pull/1104
- support bond type gt4 in MDLV3000Reader by @uli-f in https://github.com/cdk/cdk/pull/1102
- Make sure Atom/Bond's ged deref'd when going into a QueryAtomContainer. by @johnmay in https://github.com/cdk/cdk/pull/1105
- Integration of functional groups identification functionality following the Ertl algorithm by @JonasSchaub in https://github.com/cdk/cdk/pull/1039
- Generally cheminf formats use ASCII and we should not be checking the… by @johnmay in https://github.com/cdk/cdk/pull/1107
- Rdfile reader by @uli-f in https://github.com/cdk/cdk/pull/942
- add javadocs to RdfileReader and RdfileRecord, make RdfileReader final by @uli-f in https://github.com/cdk/cdk/pull/1109
- Checked, updated, and formatted documentation of FunctionalGroupsFinder by @JonasSchaub in https://github.com/cdk/cdk/pull/1110
- added test case in CDKAtomTypeMatcherFilesTest that gives rise to an NPE by @uli-f in https://github.com/cdk/cdk/pull/919
- support bond type gt4 in MDLV3000Writer by @uli-f in https://github.com/cdk/cdk/pull/1106
- Integration of sugar moiety removal functionality by @JonasSchaub in https://github.com/cdk/cdk/pull/1040
- Resolved Sonar issue with addAll in unmodifiable set by @javadev in https://github.com/cdk/cdk/pull/1114
- add test dependencies assertj and mockito-junit-jupiter by @uli-f in https://github.com/cdk/cdk/pull/1115
- add DefaultChemObjectReaderErrorHandler by @uli-f in https://github.com/cdk/cdk/pull/1112
- Fix a bug with the default SMILES output. AtomStereo was not emitted … by @johnmay in https://github.com/cdk/cdk/pull/1116
- Update BEAM to v1.3.7 to fix a corner case with reading SMILES and ar… by @johnmay in https://github.com/cdk/cdk/pull/1117
- Advanced Inorganic Handling by @johnmay in https://github.com/cdk/cdk/pull/1118
- inorganic stereo 2 by @johnmay in https://github.com/cdk/cdk/pull/1120
- Depiction Improvements (Nov 2024) by @johnmay in https://github.com/cdk/cdk/pull/1122
- Fix a minor issue with a NPE on the AwtArea util and improve tests. by @johnmay in https://github.com/cdk/cdk/pull/1123
- Fix (in)saturation expression behaviour by @johnmay in https://github.com/cdk/cdk/pull/1124
- Update CMLCoreModule.java. So far, if there was no order defined for … by @stefhk3 in https://github.com/cdk/cdk/pull/1126
- Patch/stefhk3 patch 2 fix by @johnmay in https://github.com/cdk/cdk/pull/1127
- Create codeql.yml by @javadev in https://github.com/cdk/cdk/pull/1128
- fix crash in SmilesGenerator when calling with reaction not having one or more reaction components by @uli-f in https://github.com/cdk/cdk/pull/1129
- This fixes a problem with the ordering of &. by @stefhk3 in https://github.com/cdk/cdk/pull/1131
- remove deprecated calls from InChIGeneratorTest by @uli-f in https://github.com/cdk/cdk/pull/1134
- add getAgentCount method to IReaction interface by @uli-f in https://github.com/cdk/cdk/pull/1133
- Updated dependencies by @egonw in https://github.com/cdk/cdk/pull/1136
- improve and document agent handling in MDLRXNV2000Reader by @uli-f in https://github.com/cdk/cdk/pull/1138
- update junit-jupiter dependencies to version 5.11.4 by @uli-f in https://github.com/cdk/cdk/pull/1139
- RInChI implementation based on InChI native + Java logic by @uli-f in https://github.com/cdk/cdk/pull/1137
New Contributors
- @tylerperyea made their first contribution in https://github.com/cdk/cdk/pull/1027
- @JonasSchaub made their first contribution in https://github.com/cdk/cdk/pull/1041
- @stefhk3 made their first contribution in https://github.com/cdk/cdk/pull/1126
Full Changelog: https://github.com/cdk/cdk/compare/cdk-2.9...cdk-2.10
Appendix A1
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(bldr);
SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Default);
// -OH => -[O-]
SmirksTransform deprotonate = Smirks.compile("[c:1][OX2v2+0:2][H]>>[c:1][O-:2] de-protonate\n");
long tBegin = System.nanoTime();
long tSmirks = 0;
int count = 0;
try (BufferedReader brdr = new BufferedReader(new FileReader("/data/chembl_35.smi"));
BufferedWriter bwtr = new BufferedWriter(new FileWriter("/data/chembl_35.smi.norm"))) {
String line;
while ((line = brdr.readLine()) != null) {
IAtomContainer mol = smipar.parseSmiles(line);
long tSplit0 = System.nanoTime();
// SMIRKS pattern will do aromaticity automatically, if you
// have multiple patterns being applied it may be better
// to turn this of deprotonate.setPrepare(false); and do it
// yourself
boolean changed = deprotonate.apply(mol);
long tSplit1 = System.nanoTime();
tSmirks += (tSplit1-tSplit0);
if (changed)
line = smigen.create(mol) + " " + mol.getTitle();
bwtr.write(line);
bwtr.newLine();
++count;
if (count % 1000 == 0)
System.err.printf("\r%d...", count);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
long tEnd = System.nanoTime();
long tElapsed = TimeUnit.NANOSECONDS.toMillis(tEnd-tBegin);
System.err.printf("\rdone %d in %.3fs (%.0f mol/s)\n", count, tElapsed / 1e3,
count / (tElapsed/1e3));
System.err.printf("SMIRKS in %.3fs (%.0f mol/s)\n", tSmirks / 1e9,
count / (tSmirks/1e9));
M1 Pro 2021 results:
done 2474590 in 29.449s (84030 mol/s)
SMIRKS in 8.591s (288054 mol/s)
Notes
Files
cdk/cdk-cdk-2.10.zip
Files
(26.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:dbb8302de75e4e932d818a9d091fcc93
|
26.3 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/cdk/cdk/tree/cdk-2.10 (URL)
Software
- Repository URL
- https://github.com/cdk/cdk