You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+7-6
Original file line number
Diff line number
Diff line change
@@ -121,30 +121,31 @@ This will result in a cdxml file that looks like this when viewed in ChemDraw:
121
121
122
122
## Project Status
123
123
124
-
The overall status of the project can be described best as **alpha**. It depends on the specific module used. Within
124
+
The overall status of the project can be described best as **beta**. It depends on the specific module used. Within
125
125
the limited scope of basic small molecules, the code will likely work. But it is certain there are some unknown bugs
126
126
and edge-cases not present in my set of test molecules.
127
127
128
-
Where you might run into issues is with reactions and for sure organometallics or anything that contains
129
-
non-chemical related drawings.
130
-
131
-
It's best to limit usage to "single-molecule" documents essentially treating the ChemDraw files like mol files. `cdxml`and `cdx` are more like a drawing file format with molecules as first class citizens and not a pure chemical format. Using any of these "drawing features" can lead to errors or worse silent issues. **You have been warned!**
132
-
133
128
## CDXMLConverter
134
129
135
130
`cdxml_converter`module allows you to convert between `cdxml`and `cdx` files. There is also support to convert [RDKit](https://github.com/rdkit/rdkit) molecules to `cdxml` or `cdx` files.
136
131
132
+
As of commit 9507e48 all files in the ChemDraw Samples directory can be visually correctly converted from/to cdxml. This includes correct conversion of biological shapes and sequences, images and shapes.
133
+
137
134
The conversions are based on PerkinElmers (formerly CambridgeSofts) official but very much outdated format
138
135
specification available [here](https://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/IntroCDX.htm).
139
136
Some features required "trial and error" to get working as they are either new or different from the specification. For
140
137
more details see the README.md in the modules' directory.
141
138
142
139
major known issue: very old `cdx` files do not adhere to the official format specification and hence very often fail to be read (old means around ChemDraw 7 time-frame and older).
143
140
141
+
When converting from RDKit molecules, you will likely run into issues with organometallics, polymers or other "complex" molecules. It's best to limit usage to "single-molecule" files essentially treating the ChemDraw files like mol files.
142
+
144
143
## CDXMLStyler
145
144
146
145
`cdxml_styler` module converts the style of the molecules contained in the `cdxml`document. The style options are limited to options that directly affect the display of the molecule like bond length, atom label size and so forth. The core usage scenario here is to convert a bunch of `cdxml`documents containing just molecule drawings to a standardized style.
147
146
147
+
The styling will often require translating the molecule. In case the document contains additional drawing elements that may or may not be related to the molecule like brackets these might not be translated at all or not entirely correctly. Also distances between molecules might change in relative and absolute terms.
148
+
148
149
If you have `cdx`files, convert them to `cdxml`with the `cdxml_converter`module, apply the style and convert back to `cdx`. That is in general the basic idea of this package. Do all manipulation in `cdxml`because due to it being `xml`it's relatively easy to do such manipulations in contrast to the binary `cdx`format.
Copy file name to clipboardexpand all lines: pycdxml/cdxml_converter/README.md
+9-7
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,22 @@
2
2
3
3
## Scope
4
4
5
-
CDXML converter **converts between cdx** (binary) **and cdxml**(text/xml) files containing **small molecules**. **Conversion from [RDKit](https://github.com/rdkit/rdkit) molecules to cdx/cdxml is also partially implemented**. The goal of the project is to provide conversion for files containing small molecules and later possibly reactions, at least the reaction scheme. The idea is to be able to convert such files coming or going into a database automatically on any OS and hence treating them as molecule file format. ChemDraw itself lacks such usable automation features, especially cross-platform.
5
+
CDXML converter **converts between cdx** (binary) **and cdxml**(text/xml) files. As of commit 9507e48 all files in the ChemDraw Samples directory can be visually correctly converted from/to cdxml. This includes correct conversion of biological shapes and sequences, images and shapes.
6
6
7
-
Newer feature like bilogy drawing elements will possibly only get limited support because they are missing from the specification which requires a lot fo trial and error to figure things out. The core issue for full support is that ChemDraw is essentially a drawing canvas and hence cdx and cdxml are drawing formats and not really chemical structure exchange formats.
7
+
**Conversion from [RDKit](https://github.com/rdkit/rdkit) molecules to cdx/cdxml is also partially implemented**. The goal of the project is to provide conversion for files containing small molecules and later possibly reactions, at least the reaction scheme. The idea is to be able to convert such files coming or going into a database automatically on any OS and hence treating them as molecule file format. ChemDraw itself lacks such usable automation features, especially cross-platform.
8
+
9
+
Newer feature like additions to biology drawing elements or 3D chemical features will possibly only get limited support because they are missing from the specification which requires a lot of trial and error to figure things out. The core issue for full support is that ChemDraw is essentially a drawing canvas and hence cdx and cdxml are drawing formats and not really chemical structure exchange formats.
8
10
9
11
Note that there is no chemical knowledge in this tool! It really is just a format converter.
10
12
11
13
## Status
12
14
13
-
The **status of the project is at best "alpha"** simply due to the limited scope of tested molecules and limitations how to validate the result. It has to be assumed that anything that isn't a basic small molecules can either fail or even worse lead to an invalid output without error.
15
+
The **status of the project is "beta"**. It has to be assumed that anything that isn't a basic small molecules can either fail or even worse lead to an invalid output without error.
14
16
15
17
What its implemented:
16
18
17
-
- Conversion to/from cdx to/from cdxml for small molecules and simple reactions
18
-
- Conversion from simple RDKit Mols to cdx/cdxml (partial, for example enhanced stereo is not yet implemented)
19
+
- Conversion to/from cdx to/from cdxml working for all Sample files
20
+
- Conversion from simple RDKit Mols to cdx/cdxml including enhanced stereochemistry
19
21
20
22
## ChemDraw Format Specification
21
23
@@ -43,7 +45,7 @@ Let's just say it's a big mess also driven by the format issues and there is a l
43
45
44
46
The internal architecture consist of `ChemDrawDocument`class which wraps an `ElementTree`. This element tree is a `cdxml` document. Each element has attributes and a value where the values have types defined in the cdx specification. For reach type their is a class that knows how to represent itself either as `bytes`as in `cdx` or as string value in `cdxml`.
45
47
46
-
In `cdx`elements (tree nodes) are called objects and attributes are called properties.
48
+
In `cdx`elements (tree nodes) are called objects and attributes are called properties.
47
49
48
50
#### Reading / Writing
49
51
@@ -59,4 +61,4 @@ The issues with this basic read/write mechanism is, that the reading and writing
59
61
60
62
It also means that every type has an `CDXType` implementation even INT8, INT16 etc. which is a bit ugly really but it "unifies" the design. Again there is room for improvement, simplification and better performance.
61
63
62
-
In terms of logging please be adivised that the debug level should only ever be used in case of troubleshooting a specific file. It is very verbose which makes things slow and generates huge log files.
64
+
In terms of logging please be advised that the debug level should only ever be used in case of troubleshooting a specific file. It is very verbose which makes things slow and generates huge log files.
Copy file name to clipboardexpand all lines: pycdxml/cdxml_styler/README.md
+3-5
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ CDXMLStyler is a module to programmatically adjust the style of all the molecule
6
6
7
7
`cdxml` is the xml-based format of ChemDraw. This style-conversion can be achieved with ChemDraw itself but not in an automated fashion. Therefore the main purpose of the tool is batch conversion of multiple`cdxml` files to the same style either for normalization in a database or for usage in the same ChemDraw file (see also CDXML Slide Generator module).
8
8
9
+
The styling will often require translating the molecule. In case the document contains additional drawing elements that may or may not be related to the molecule like brackets these might not be translated at all or not entirely correctly. Also distances between molecules might change in relative and absolute terms.
10
+
9
11
## Usage
10
12
11
13
`CDXMLStyler`class can be instantiated either by an included named style ( currently limited to ACS 1996 or Wiley), a template `cdxml`file or from a `dict` containing the required style options.
@@ -46,8 +48,4 @@ The required style options for using a `dict`input are:
46
48
-`LabelFont`
47
49
-`HideImplicitHydrogens`
48
50
49
-
Note that `LabelFont`is an index (integer) to the according font in the font table. So the actual font used will depend on the input documents font table<sup>1</sup>. `HideImplicitHydrogens`is relevant because some styles have it set to `yes` meaning that say an alcohol is displayed as `O` and not `OH`. Therefore when this setting changes from old to new style, the atom label needs to be adjusted accordingly.
50
-
51
-
52
-
53
-
<sup>1</sup>Actually this should probably be improved so that a font name can be used and if not present it's added to the font table. Currently if style source is a `cdxml`file the `LabelFont` index is taken and applied to the source document. If the source has a different font at that index in the font table, then the output is wrong.
51
+
Note that `LabelFont` must be a valid font name available on the system. This is not verified! `HideImplicitHydrogens`is relevant because some styles have it set to `yes` meaning that say an alcohol is displayed as `O` and not `OH`. Therefore when this setting changes from old to new style, the atom label needs to be adjusted accordingly.
0 commit comments