Qi Meeting Mar 8 2024

The Elusive Source Syntax

DRAFT

Qi Meeting Mar 8 2024

Adjacent meetings: Previous | Up | Next [None]

Summary

We started to investigate ways to preserve source syntax through expansion in Syntax Spec, for the purposes of providing appropriate error messages.

Background

Qi syntax is first expanded and then compiled to produce Racket code that is then expanded and compiled into the bytecode that is ultimately evaluated in the Chez Scheme runtime. If an error is encountered during Qi compilation, we would like to implicate the source syntax, that is, the code originally entered by the user, rather than the target syntax produced by the compiler. But at this stage in the processing of the input syntax, that source syntax is no longer available. In the recent Qi 4 release that included the optimizing compiler, we had to implement a "de-expander" to reconstruct the source syntax from the target expression for use in error messages, but this is a fragile approach, and the source expression cannot always be reliably reconstructed.

Blame Game Redux

We've talked before about how, whenever an error occurs during compilation, there are many options for whom to blame, for instance, in the case of a hosted language like Qi, (1) the surface syntax, (2) the macro, (3) the core language syntax, (4) the target language (Racket) syntax.

There is no general rule that can be applied, and which expression should be blamed is decided on a case-by-case basis.

Here are some examples that illustrate each case.

Examples of Each Blame Case

Bad Surface Syntax

(define-qi-syntax-rule (my-amp f)
  (>< f))

(~> (3) (my-amp add1 sqr))

Here, we have a misuse of my-amp, and so, the surface syntax (my-amp add1 sqr) should be blamed.

Bad Macro

(define-qi-syntax-rule (my-amp f)
  (>< f g))

(~> (3) (my-amp sqr))

Here, the macro my-amp expands to an invalid use of the core >< form. The macro should be blamed.

Bad Core Syntax

Code generation of Racket from Core Qi is fulfilled by the qi0->racket macro, which, as part of its operation, delegates to helper functions that parse syntax for specific forms.

Here is an excerpt of the code generation parser for amp:

(define (amp-parser stx)
  (syntax-parse stx
    [(_ onex:clause)
     #'(curry map-values (qi0->racket onex))]))

map-values is a Racket function, so its argument is expected to be a Racket expression. If we had mistakenly written it this way:

(define (amp-parser stx)
  (syntax-parse stx
    [(_ onex:clause)
     #'(curry map-values onex)]))

… then the argument to map-values could be uncompiled Qi syntax that the Racket expander would not understand. In this case, if we happen to try an expression like (>< (~> sqr add1)), the error might simply say "~>: undefined," as it would assume that onex is Racket syntax, and that ~> should be either a function or a macro defined in the macro-defining module. There is no such identifier defined there, and so this would be the resulting error.

But this error would not be very helpful. It would be better to blame the qi0->racket macro (which delegates to this function as part of fulfilling expansion) for generating invalid syntax here. As this macro is part of the compiler for the core language, this is an instance of blaming the DSL core language.

Bad Target Syntax

It's conceivable that the Qi compiler could end up producing an expression like this:

(lambda (x) (mac x))

If mac is a macro that happens to expect two arguments, then this would be a misuse of mac and we should blame this entire expression, which is an expression of the target language, Racket.

Of course, this may have resulted from an incorrectly constructed Qi expression where the user explicitly passes a single argument to the invocation of mac, but there is no way for Qi to know the expected syntax of mac as it is a Racket macro. As far as it is concerned, Qi produced correct Racket syntax here.

Preserving the Source Syntax

So whom to blame is to be determined on a case-by-case basis. For the purposes of the Qi compiler, we at least need to have a reference to the source syntax so that in cases where we determine that it is the culpable party, we are in a position to blame it.

But this presented some difficulties.

TODO first, the full stack of pointers is too expensive, memory wise unclear how to measure it though - OS tools or Racket native tools? unclear whether there are any reliable ways to do it. in any case, we are deciding to try the more modest approach of just preserving the original source syntax rather than the full stack of transformations

But that brought up some new concerns: in general, we want to propagate the property if defined, else create it but we don't want to propagate source syntax to certain positions (nested macros?) also, if multiple expressions are produced in the expansion, then it would be expensive to traverse them and attach the property to each component one option: in trying to determine blamable party, if property is not defined, check parent, and repeat until property is present. if not present, then that's an error But we may not have a way to access the parent

Next Steps

(Some of these are carried over from last time)

Merge "docs arrears" PR containing documentation related to Qi 4, including effect locality, etc.
Bring Qi meeting notes up to date
Review language composition proposal and implement a proof of concept.
Decide on appropriate reference implementations to use for comparison in new benchmarks report and add them.
Deforest other racket/list APIs via qi/list
Review whether we can continue optimizing a single syntax node at many levels
Preserve the source syntax through expansion in Syntax Spec.

Attendees

Dominik, Michael, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly