Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Express language according to BCP 47 #89

Open
laurentromary opened this issue Feb 15, 2025 · 4 comments
Open

Express language according to BCP 47 #89

laurentromary opened this issue Feb 15, 2025 · 4 comments
Assignees

Comments

@laurentromary
Copy link

laurentromary commented Feb 15, 2025

I see the specification of the language property is expressed in reference to ISO 639-3. ISO 639-3 is now replaced (like other parts of the former 639 multipart standard) by a single ISO 639:2023 Code for individual languages and language groups. In this standard you still have sets (in this case set 3) reminiscence of the former parts, but it is more adequate to follow the recommendation of IETF BCP 47 which articulates the ISO 639 language code with other standards (scripts, locale, etc.) thus allowing a finer-grained representation of language varieties. This is what is adopted for instance for the generic xml:lang attribute in the XML recommendation and would ensure a better interoperability across applications. You can find a good introduction under: https://www.w3.org/International/articles/language-tags/

@MarekSuchanek
Copy link
Collaborator

Thanks! This is simply about the note Language of the ... expressed using ISO 639-3 placed in the README.md, right?

If we state there, it is according to ISO 639:2023 Set 3 then it is not a breaking change; however, if we would just state ISO 639:2023 then it is a breaking change?

This might be also related to #87 as I see some note in DCAT3 property language...

@laurentromary
Copy link
Author

The point is whether you want to stick to the 3 letter codes (which was meant by the reference to 639-3 I gather) or adopt the larger framework of IETF BCP 47 which recommends 2 letter codes when they exist (e.g. fr, en) and allows one to use more elaborate identifiers (e.g. fr-CA for Canadian French). I think it would provide a better interoperability framework with other applications using language information.

@MarekSuchanek
Copy link
Collaborator

Ahh I see... that would be better, but we introduce a breaking change in DCS then (so potentially for v2.0)? Or does BCP 47 also allow 3 letter codes (without additional)?

@laurentromary
Copy link
Author

It does not, but hardly anyone is using the 3 letter codes alone. So, yes, you need to introduce a deprecation for the old format and organise the switch .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants