Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update keyword list #5681

Merged
merged 1 commit into from
Apr 5, 2024
Merged

Update keyword list #5681

merged 1 commit into from
Apr 5, 2024

Conversation

atsansone
Copy link
Contributor

@atsansone atsansone commented Apr 2, 2024

@dart-github-bot
Copy link
Collaborator

Visit the preview URL for this PR (updated for commit 7ce95ab):

https://dart-dev--pr5681-fix-3322-h23qa9i6.web.app


The following table lists the words
that the Dart language reserves for its own use.
Don't use these terms as identifiers unless the term notes an exception.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some form of the original guidance is appropriate here. It's almost never good style to use any of these as identifiers. So maybe say something like "These words cannot be used as identifiers unless otherwise noted. Even when allowed, using keywords as identifiers is confusing and should be avoided."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to discourage the user of, fx, show and hide as variable or method names. Sometimes they are precisely what you want to use, and they work.

The thing users need to know are the actual restrictions: reserved words that cannot be used as a name, ever, and built-in identifiers that cannot be used as a type or prefix name.

The rest are not really important.
Or that many: hide, show, async, sync and on. Now possibly also type.
These are identifiers that can be used as names, even for types or prefixes (but as lower case, they shouldn't be names of types or extensions), but which also occur in a few specific places not as a name to be resolved, but as a keyword of a syntactic construct.

If this is a list of every word that occurs as a "keyword" (managing: not as a resolvable name) anywhere in the grammar, then those words should be included, and linked to the descriptions of the constructs where the word can occur.
That's a fine index, if someone sees a word they don't understand, they can look it up, and see what it means in the context where they found it.

That index can also double as the list of restrictions, saying which words are restricted words and built-in identifiers, wish it's useful information. It can even mention that type cannot be the name of an extension, as a unique restriction.
I don't think being on the list as a contextual keyword should imply that the word should not be used as a name.
(And await and yield should be listed as restricted words, with a comment that they're not reserved everywhere, because users should refrain from using them as names everywhere. Unlike show or sync, which are perfectly fine names.)

@leafpetersen
Copy link
Member

Trying to audit the changes that were made here (very hard to see in the diff), it looks like the changes are:

  1. Collapsed previous categories 1 (contextual) and 3 (async related) into a single category "depends on context" (seems reasonable to me).
  2. Made the following changes:
  • covariant from built-in to contextual (2 -> 1)
  • hide from contextual to reserved (1 -> nothing)
  • of added as contextual (1)
  • type moved from contextual to reserved (3 -> nothing)

Is that a correct summary of the changes? @lrhn @munificent does that seem right to you?

@lrhn
Copy link
Member

lrhn commented Apr 2, 2024

Without having checked everything, that does look like the changes that stand out.

I don't think covariant, hide or type should have changed.
I don't remember adding any new reserved words.

The type word was marked {{lrw}}, for "lovely reserved word", what we used for await. That should just be a contextual keyword, like hide.
I don't remember how we defined covariant, but built-in-identifier is probably correct.

And of is definitely contextual.

Copy link
Member

@eernstg eernstg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments. Note that several words are used in the grammar as a structural part (e.g., hide is a structural part of import 'x.dart' hide x;, it isn't a name which is looked up in the scope), and yet there are no restrictions on these words as identifiers at all. For example, hide can be the name of a variable, a parameter, a class, anything.

So they aren't contextual keywords (or contextual anything), they are just identifiers that happen to be recognized as structural elements in a few situations.

I suggested using the category nothing for those words because there's nothing special about them as identifiers.

I also suggested several other changes for words whose classification is different than indicated in keywords.yml.

Comment on lines +35 to +37
{{bii}} These keywords can't be used as class names, type names,
or import prefixes. They can be used as identifiers in all other
circumstances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Class names' is a subset of 'type names', and we may also want to mention a few other kinds of type names. Perhaps:

Suggested change
{{bii}} These keywords can't be used as class names, type names,
or import prefixes. They can be used as identifiers in all other
circumstances.
{{bii}} This keyword can't be used as the name of a type (a class, a
mixin, an enum, an extension type, or a type alias), or the name of
an extension, or as an import prefix. It can be used as an identifier
in all other circumstances.

extension declarations are singled out because that kind of declaration doesn't introduce a type. You could also include that into '(a class, a mixin, an enum, an extension, an extension type, or a type alias)' if it's considered overly pedantic to mention it separately. ;-)

I'm using a singular form because it makes everything in this paragraph a bit more straightforward and unambiguous, and also because a reader would often arrive at this paragraph because they want to know more about one specific identifier that they were looking at.

Comment on lines +32 to +33
{{ckw}} These keywords can be used as an identifier
depending on **context**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a singlar form for consistency with lines 35-37:

Suggested change
{{ckw}} These keywords can be used as an identifier
depending on **context**.
{{ckw}} This keyword can be used as an identifier
depending on **context**.

Also, 'these keywords' can be used as 'an identifier' isn't quite congruent.

type: reserved
- term: 'async'
link: /language/async
type: context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async is one of those identifiers that occur verbatim in a grammar rule, but there are no restrictions on the use of this identifier (it can be the name of a variable, a type, an import prefix, anything).

We need a separate category for these identifiers. I'll use nothing as a strawman here, because there's nothing special about these words. They will be recognized as being a structural part of certain syntactic forms, not just as the name of a declared entity (like other identifiers).

Suggested change
type: context
type: nothing

I'm using the same type in several cases below, for the same reason.

type: context
- term: 'base'
link: /language/class-modifiers#base
type: bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: bit
type: nothing

type: reserved
- term: 'covariant'
link: /guides/language/sound-problems#the-covariant-keyword
type: context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covariant is a built-in identifier

Suggested change
type: context
type: bit

type: reserved
- term: 'sealed'
link: /language/class-modifiers#sealed
type: bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: bit
type: nothing

type: bit
- term: 'show'
link: /language/libraries#importing-only-part-of-a-library
type: context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: context
type: nothing

type: reserved
- term: 'sync'
link: /language/functions#generators
type: context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: context
type: nothing

type: reserved
- term: 'type'
link: /language/extension-types
type: reserved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is really tricky. I couldn't convince the language team to make it a built-in identifier, so it isn't. But it is just like a built-in identifier in that it cannot be the name of "some" kinds of declarations, except that it is only forbidden for an extension declaration to have the name type. That's again because it is confusing if extension type ... can be an extension whose name is 'type' rather than an extension type. So let's call it a built-in identifier. It's almost true.

Suggested change
type: reserved
type: bit

type: reserved
- term: 'when'
link: /language/branches#when
type: reserved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: reserved
type: nothing

@lrhn
Copy link
Member

lrhn commented Apr 3, 2024

I added some comments. Note that several words are used in the grammar as a structural part (e.g., hide is a structural part of import 'x.dart' hide x;, it isn't a name which is looked up in the scope),
...
So they aren't contextual keywords (or contextual anything), they are just identifiers that happen to be recognized as structural elements in a few situations.

That might be a terminology difference, because that's precisely how I would define a "contextual keyword": As a word that has a meaning entirely based on the syntactic context it occcurs in, independently of how the same identifier may be used elsewhere.

The reserved word for is reserved because it can be used to initiate a for statement, in a position where there is no context which necessitates the for or specifically allows it.
Wheras hide has no such inherent meaning, it only has a contextual meaning when occuring in specific positions of an import or export declartion. And it's not reserved, or even partially-reserved (as a built-in identifier).

The reserved word if introduces an if statement or element, but its use in a conditional import is really more of a contextual keyword use of the same identifier. (So, "it's complicated".)

@eernstg
Copy link
Member

eernstg commented Apr 3, 2024

That might be a terminology difference, because that's precisely how I would define a "contextual keyword": As a word that has a meaning entirely based on the syntactic context it occcurs in, independently of how the same identifier may be used elsewhere.

The word "keyword" isn't defined in our specification documents, we have only used it informally. It is quite useful in some situations to have an informal term that we can use in order to avoid being overly specific about some detailed distinctions. So I would actually prefer to avoid defining "contextual keyword" in the first place, because "keyword" is informal.

But it does make sense to take context dependencies into account for specific words:

We have three words that the parser can accept as an <identifier> in some contexts and not in others. The well-known ones are await and yield. In a function body which is marked async, async*, or sync* (and not inside a nested function body that doesn't have one of those markers), these words are treated as reserved words. That is, they cannot be parsed as an <identifier> (so they can't be the name of a declaration in that scope, of any kind, and it is not possible to refer to a declaration with that name, even if such a declaration exists somewhere in an outer scope).

The third one is type, which is a reserved word in one very specific context: It cannot be the name of an extension declaration. In other words, it cannot be parsed as a <typeIdentifier> in that very specific context.

No other word has the ability to be an <identifier> or <typeIdentifier> in some syntactic contexts and not in other contexts.

This is useful to know because it explains why int yield; is perfectly fine in one function body, but it is a syntax error in another function body.

I suggested that these words should be classified as type: context, because they are reserved words in a specific context, and they are <identifiers> (as well as <typeIdentifiers>) in every other context.

Returning to hide, there's nothing special to note: It can be parsed successfully as an <identifier> and as a <typeIdentifier> in every context where any of those non-terminals is expected. Similarly for of, on, and several other words (for which I suggested that we should use type: nothing).

In particular, such a word is never a contextually reserved word because it's never a reserved word, period.

The reserved word for is reserved because it can be used to initiate a for statement, in a position where there is no context which necessitates the for or specifically allows it.

The word for and a bunch of other words are reserved because we put them into the list of reserved words. This happens to be a crucial disambiguation mechanism, because it allows the parser to conclude that it is about to parse one of a very limited set of constructs (a for statement or a for element, possibly asynchronous), and it can use the parser state to choose the relevant one.

The choice of reserved words is somewhat arbitrary, we just need to have enough of them to ensure that the grammar is parseable (using the kind of parsing technology that we want to use). For example, for occurs first, but class is the first reserved word in abstract interface class C {}, and mixin M implements C {} doesn't contain any reserved words at all.

So there's nothing deep or "meaningful" about which words are reserved words and which ones are not, it's all a matter of restricting which words can occur as an <identifier> or <typeIdentifier> in various parsing situations, because that gives rise to a wide open space of parsing ambiguity.

However, it is a hugely breaking change to add new words to the set of reserved words, so we don't do that. The set of reserved words is a historical fact that just happens to suffice (as long as we're really careful whenever new syntax is introduced).

The set of built-in identifiers is similar, though much less restrictive: They can be the name of everything which isn't a type or an import prefix. That's enough to remove a lot of ambiguity in some other parsing situations. We do add a new built-in identifier now and then.

In any case, being a reserved word or a built-in identifier is a property of a word, not a property of a position in a syntactic construct. It serves no other purpose than preventing that word from being parsed as an <identifier> or a <typeIdentifier>, which allows us to recognize the syntactic construct in the first place.

So it's a for statement because the token for in the current parsing situation cannot possibly initiate anything other than a for statement ... it isn't magically a for statement, which then bestowes the honor of being a reserved word upon the word for. ;-)

The only modification of the status of a word that we have is the ability of those specific three words (await, yield, type) to be a reserved word in some syntactic contexts, and a <typeIdentifier> and an <identifier> everywhere else.

(So, "it's complicated".)

I'll buy that one. ;-)

@leafpetersen
Copy link
Member

@eernstg @lrhn I'm not sure this discussion is really useful for this venue. We have a specification where we can and should be extremely precise about this, but the goal here is just to tell users what words they can't/shouldn't use as identifiers, with a bit of extra color for the ones that they can if they really have to. Are we in agreement about the set of corrections necessary here or should we schedule something to discuss?

@atsansone
Copy link
Contributor Author

@leafpetersen @lrhn @eernstg : I am merging with @eernstg approval. We can always revisit.

@atsansone atsansone merged commit 3318153 into dart-lang:main Apr 5, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor 'Keywords' page to clarify the 4 types of keywords Compare keyword tables
5 participants