Some working notes on how to scope the Alan syntax elements.
Table of Contents
A valid identifier in Alan has the following pattern:
IDENTIFIER : LETTER ( LETTER | DIGIT | '_' )*
... where LETTER
means:
- any char in the Ascii range
a
..z
- any char in the Ascii range
A
..Z
- any char in the Unicode range
U+00E0
..U+00F6
(à
..ö
) - any char in the Unicode range
U+00F8
..U+00FE
(ø
..þ
)
The two Unicode ranges represent characters from the Latin-1 Supplement (ISO-8859-1), which is the default encoding expected by Alan.
Here is the full ISO-8859-1 subset of valid characters in an identifier:
0 1 2 3 4 5 6 7 8 9
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
_
a b c d e f g h i j k l m n o p q r s t u v w x y z
à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ
To simplify, the Unicode range can be described as the set U+00E0
..U+00FE
minus U+00F7
(obelus symbol: ÷
).
The valid letters can be represented via the following RegExs, in order of simplification:
[a-zA-Z\x{00E0}-\x{00F6}\x{00F8}-\x{00FE}]
[a-zA-Z\x{00E0}-\x{00FE}&&[^\x{00F7}]]
[a-zA-Zà-þ&&[^÷]]
The full identifier
pattern is: (\b['a-zA-Zà-þ&&[^÷]][0-9_'a-zA-Zà-þ&&[^÷]]*\b)
, which in the sytnax source is represent through variables:
variables:
LETTER: 'a-zA-Zà-þ&&[^÷]'
ID: '(\b[{{LETTER}}][0-9_{{LETTER}}]*\b)'
While the above RegEx does a fairly good job at matching valid identifiers and ignoring malformed ones, the presence of invalid symbols characters (including ÷
) in the middle of a word will cause the syntax parser to split the word — because most symbols satisfy the boundry condition of \b
. Therefore, an invalid identifer like abc÷123
will be actually parsed as:
abc
— identifier÷
— stray symbol123
— numeric constant
There isn't much I can do about it; besides, we don't expect users to employ symbols in identifiers — if they chose to do so, at their own peril!
The BNF definion of Identifier
in "alan.g
" source (line 15):
fragment IDENTIFIER : LETTER ( LETTER | DIGIT | '_' )* ;
Also, in "alan.g
" (line 9):
fragment LETTER : 'A' .. 'Z' | 'a' .. 'z' | '_' | '$' | '\u00e0'..'\u00f6' | '\u00f8'..'\u00fe' ;
NOTE — The latter BNF definition might lead to assume that
_
and$
are also valid characters in the LETTER range, but it is not so (these don't reflect the actual validations carried out by the Alan compiler in real use). The$
symbol is never a valid character in identifiers, and the_
can't be used as the first character of an identifier.
A few notes about Alan language syntax elements and their scoping.
Class scopes:
entity.name.class
entity.name.class.forward-decl
entity.other.inherited-class
Fields, properties, members and attributes of a class or other data structure should use:
variable.other.member
There are eight pre-defined (hard-coded) Alan classes:
entity
thing
object
actor
*
location
*literal
string
integer
NOTE 1 —
actor
andlocation
are found in the Alan Keywords list, but not the other hard-coded classes.
NOTE 2 —
hero
doesn't appear in the Alan Keywords list either, although it's a hard-coded instance.
THIS
should be scoped as variable.language
(for reserved language variables like this
, super
, self
, etc.)
There are 123 keywords in Alan 3 language, as mentioned in the Reference Manual (Appendix D.2 Keywords).
actor |
add |
after |
an |
and |
are |
article |
at |
attributes |
before |
between |
by |
can |
cancel |
character |
characters |
check |
container |
contains |
count |
current |
decrease |
definite |
depend |
depending |
describe |
description |
directly |
do |
does |
each |
else |
elsif |
empty |
end |
entered |
event |
every |
exclude |
exit |
extract |
first |
for |
form |
from |
has |
header |
here |
if |
import |
in |
include |
increase |
indefinite |
initialize |
into |
is |
isa |
it |
last |
limits |
list |
locate |
location |
look |
make |
max |
mentioned |
message |
min |
name |
near |
nearby |
negative |
no |
not |
of |
off |
on |
only |
opaque |
option |
options |
or |
play |
prompt |
pronoun |
quit |
random |
restart |
restore |
save |
say |
schedule |
score |
script |
set |
show |
start |
step |
stop |
strip |
style |
sum |
synonyms |
syntax |
system |
taking |
the |
then |
this |
to |
transcript |
until |
use |
verb |
visits |
wait |
when |
where |
with |
word |
words |