-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Implement generators for XML text and attribute #113
base: master
Are you sure you want to change the base?
Conversation
bdce8c5
to
2fcc470
Compare
Tagging @pact-foundation/maintainers for review of this RFC to introduce generators for XML text and attributes. Thank you 🙌🏾 |
First and foremost, I think this is a great idea; and as an initial starting point, I think it is a great RFC 🚀 I do have some questions and comments, and these mostly revolve around the complexity of XML as a format. For reference, I'm referring to the XML 1.1 specification. Explicit support for UTF-8XML explicitly allows (nearly) all of Unicode. As an example of valid XML from Wikipedia: <?xml version="1.0" encoding="UTF-8"?>
<俄语 լեզու="ռուսերեն">данные</俄语> The RFC should be clear that we do support UTF-8, in tags, attributes, and bodies. Support for XML DeclarationAt the start of XML documents, an XML declaration is required (though often ignored). This declaration specifies the version of XML being used, as well as the encoding: <?xml version="1.0" encoding="ASCII"?>
... Since this is a required element of a valid XML document (even though in practice it is often ignored), we should make sure that we can generate it when required. Empty tagsThere are two ways an empty tag can be represented in XML:
The specification is clear that both forms are equivalent:
The specification also makes it clear that the preference is to use the empty-tag representation as opposed to a start-end tag. The RFC should make a mention that empty tags will be generated as the Support for white-space preservingXML parsers typically don't care about whitespaces much and allow for indentation. The exception to the rule is if the special This is somewhat niche, but the RFC should be clear as to whether this is supported or not. In particular, nothing stops us from defining a generator which adds Support for escaping dataSince the generators are specified in JSON, the RFC should make it completely clear how/when data is escaped. To avoid any confusion, I think the generator should automatically escape data, and the inputs should be read using standard JSON parsing rules. This means that the following is correct: {
"name": "example",
"children": {
...,
"content": "<foo />"
}
} and it would be incorrect to have: {
"name": "example",
"children": {
...,
"content": "<foo />"
}
} unless one wanted to have the There's also a question as to the way to escape the data, since XML has two options:
I feel like the Support for Comments (?)Unlike JSON, XML supports for comments, and therefore a natural question is: should we support them? <data>
<!-- some explanation -->
<date>...</date>
</data> Comments are explicitly not part of the actual data being transferred, so my gut instinct to the question is 'no' and this should be made clear in the RFC. Having said that, has anyone ever run in a scenario which (for whatever reason) required a comment to be present to pass validation? Support for ArraysXML does not have an array data type, and instead uses repeated tags: <data>
<tag>1</tag>
<tag>2</tag>
...
</data> I don't see anything in the way your RFC is structured which would conflict with this, but I do think the RFC should include an explicit example of generating arrays. Support for Type DeclarationsXML can be a self-describing format, with the definitions being transmitted alongside the data: <!DOCTYPE data [
<!ELEMENT data (tag*)>
<!ELEMENT tag (#PCDATA)>
]>
<data>
<tag>1</tag>
<tag>2</tag>
...
</data> I suspect adding support for this may be out of the scope of the RFC, but it is worth mentioning withi nthe RFC that this is not supported. Support for Attribute-list DeclarationsXML also allows for the definition of attributes: <!ATTLIST data
id ID #REQUIRED
name CDATA #IMPLIED
>
<data id="1" name="example">
...
</data> Again, I suspect this is out of scope, but it is worth mentioning that this is not supported. Support for EntitiesXML allows for the definition of entities (whether internal or external): <!DOCTYPE github [
<!ENTITY domain "github.com">
<!ENTITY ips SYSTEM "https://dns.google/resolve?name=github.com&type=A">
]>
<github>
<domain>&domain;</domain>
<ips>&ips;</ips>
</github> It is straightforward to have a generator use entities (in fact, |
Just a note, the reason that generators are not supported for XML is because it is more complex than with JSON. The current implementation is based on finding the element in the body that matches the path, and replacing it with a generated value. XML can have multiple matching nodes, and that would require being able to find all of them and replacing all the items. This is where I gave up on it. I.e. with <container>
<tag/><tag/><tag/>
</container> and the path with <container>
<tag>
<child>
<subchild>
</subchild>
<subchild>
</subchild>
</child>
<tag/>
<tag>
<tag/>
<other>
<child>
<name>
</name>
<name>
</name>
</child>
<other/>
</container> and the path |
These are not relevant for using generators with XML. |
Care to go into any level of detail as to why? |
Thanks for doing this Tien! A couple of additional clarifications: 1. Mixed Content Handling <summary>Release date: <date>2024-03-12</date>. Copyright ACME inc.</summary> If a generator is applied to Presumably it would leave the nested element in tact i.e. Element has mixed children (text + elements)
Note:
If we introduce indexing for text nodes in the future, we could support defining different generators for each text node: Generators:
Would produce -- 2. Namespaces and Prefixes The RFC covers namespace handling for attributes and elements, but it's focussed on locating elements. It isn't clear whether prefixes or values can be dynamically generated or replaced. My assumption is that we would not support this. -- 3. Comments The IETF RFC for XML, specifically XML 1.0 (Fifth Edition), mentions the following with regards to comments:
As such, I think we can ignore them entirely.
No, I haven't. -- 4. Processing Instructions e.g.
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
<root>...</root> Processing instructions are application-specific, changing them doesn't technically impact semantics but they may be important application-level instructions (e.g. used in a data pipeline down the line). |
No description provided.