Added documentation for the upcoming indentation-sensitive syntax.

James T. Martin 2022-09-08 11:31:57 -07:00
parent bce39fdc22
commit c35023a686
Signed by: james
GPG Key ID: D6FB2F9892F9B225
2 changed files with 195 additions and 0 deletions

3
.gitignore vendored
View File

@ -5,6 +5,9 @@
!/src/**/*.c
!/src/**/*.h
# documentation
!/docs/**/*.md
# top-level configuration
!/.editorconfig
!/.gitignore

192
docs/syntax.md Normal file
View File

@ -0,0 +1,192 @@
# Syntax Reference
## Grammar
The grammar is LL(1).
```ebnf
block = open-block, ("pass" | block-body), close-block ;
block-body = stmt, [{ terminator, stmt }] ;
stmt = assignment | expr ;
assignment = var, [":", expr], "=", expr ;
expr = "if", expr, block, [ "else", block ]
| "loop", [ label ], control-vars, block
| "next", [ label ]
| "exit", [ label ], expr
| "return", expr
(* these expressions can be used as the LHS of *)
(* a function application or binary operator. *)
| "(", expr, ")", expr-cont
| unop, expr, expr-cont
| string, expr-cont
| num, expr-cont
| var, expr-cont
;
(* an optional binary operator or function application *)
expr-cont = [ binop, expr | expr ] ;
control-vars = [ control-var, [{ ",", control-var }] ] ;
control-var = assignment | var ;
```
## Lexemes
If you use `{`, `}`, and `;` for `open-block`, `close-block`, and `terminator`,
then the lexer is regular. If you use indentation-sensitive syntax, then lexing
is context-sensitive.
```ebnf
unop = "-" | "~" | "!" ;
(* arithmetic *)
binop = "+" | "-" | "*" | "/" | "%"
(* bitwise *)
| "&" | "|" | "^" | "<<" | ">>" | ">>>"
(* logical *)
| "=" | ">" | "<" | ">=" | "<=" | "!="
(* types *)
| ":" | "->"
;
num = ["-"], { decimal-digit | "," }, ["#", { digit | "," }] ;
string = '"', [{ -('"' | newline }], '"' ;
label = "'", identifier ;
identifier = alpha, [{ alphanumeric }] ;
alpha = ? 'A'..'Z' | 'a'..'z' ? ;
decimal-digit = ? '0'..'9' ? ;
alphanumeric = decimal-digit | alpha ;
digit = alphanumeric ;
newline = "\r" | "\n" ;
```
A number is a series of base 10 digits by default.
You may use a different base using the syntax `base#digits`,
e.g. `2#100101`, `16#DEADBEEF`.
## Blocks & Terminators
The rules for blocks and terminators.
1. A terminator is emitted when the indentation level is
the same as a previous indentation level in a block:
Example:
```thadius:
x = 10
print "hello, world!"
y = 3
```
2. A block is opened when `:` occurs at the end of a line.
The indentation level for the block will be the indentation level
of the following line.
Example:
```thadius
loop:
pass
```
3. A block is closed when the indentation level of a line
is less than the indentation level of a block.
Example:
```thadius
loop:
...
...
```
4. If a new indentation level is introduced and it is *not* the start of a block,
then it is the continuation of an expression, not a new block.
All lines and nested indentation levels are *ignored* and no tokens are emitted.
Example:
```thadius
some_variable = some long expression
+ some other long expression
```
## Indentation Levels
The rules for indentation levels:
1. All whitespace on an empty line is ignored.
Example:
```thadius
...
<TAB> <TAB> // very bad indentation, but no error
...
```
2. All additional indentation on a line is combined into *one* indentation level.
Example:
```thadius
if x:
... // *one* level deeper!
```
3. All tabs on a line must precede all spaces. (One level of mixed indentation is allowed.)
Good:
```thadius
<TAB><TAB>if x:
<TAB><TAB> ...
```
Good:
```thadius
loop:
<TAB> if x:
<TAB> ....
```
Bad:
```thadius
<TAB> if x:
<TAB> <TAB>...
```
4. All indentation must match preceding lines (except for new indentation levels):
Good:
```thadius
<TAB>if x:
<TAB> ...
<TAB> ... // same as previous line
<TAB>... // matches indentation of `if x`
```
Bad:
```thadius
<TAB>... // this line used a tab...
... // but this line used a space, even if it looks the same
```
Bad:
```thadius:
if x:
...
... // this line doesn't match the level of the previous line *or* the `if`
```
## Operators
The full list of operators is specified in "Lexemes".
Operator precedence:
* Unary operators always have greatest precedence.
* Arithmetic operators: `*` = `/` = `%` > `+` = `-`
* Bitwise operators: `&` > (`|` ? `^`)
* Logical operators: (`=` ? `!=` ? `>` ? `<` ? `>=` ? `<=`) > all arithmetic or bitwise operators
* Type operators: `:` > all other binary operators
Operator associativity:
* left-associative: `*`, `/`, `+`, `-`, `&`, `|`, `^`, `:`
* right-associative: `->`
* non-associative: `=`, `!=`, `>`, `<`, `>=`, `<=`
If two operators are not related by a precedence (either `?` or not specified),
then they cannot be used in the same expression without grouping using parentheses.
There are no ternary, postfix, or mixfix operators.
User-defined operators are not allowed (at least not for now).