Added documentation for the upcoming indentation-sensitive syntax.
parent
bce39fdc22
commit
8b251bd1d6
|
@ -5,6 +5,9 @@
|
||||||
!/src/**/*.c
|
!/src/**/*.c
|
||||||
!/src/**/*.h
|
!/src/**/*.h
|
||||||
|
|
||||||
|
# documentation
|
||||||
|
!/docs/**/*.md
|
||||||
|
|
||||||
# top-level configuration
|
# top-level configuration
|
||||||
!/.editorconfig
|
!/.editorconfig
|
||||||
!/.gitignore
|
!/.gitignore
|
||||||
|
|
|
@ -0,0 +1,193 @@
|
||||||
|
# Syntax Reference
|
||||||
|
## Grammar
|
||||||
|
The grammar is LL(1).
|
||||||
|
|
||||||
|
```ebnf
|
||||||
|
block = open-block, ("pass" | block-body), close-block ;
|
||||||
|
block-body = stmt, [{ terminator, stmt }] ;
|
||||||
|
stmt = assignment | expr ;
|
||||||
|
assignment = var, [":", expr], "=", expr ;
|
||||||
|
expr = "if", expr, block, [ "else", block ]
|
||||||
|
| "loop", [ label ], control-vars, block
|
||||||
|
| "next", [ label ]
|
||||||
|
| "exit", [ label ], expr
|
||||||
|
| "return", expr
|
||||||
|
(* these expressions can be used as the LHS of *)
|
||||||
|
(* a function application or binary operator. *)
|
||||||
|
| "(", expr, ")", expr-cont
|
||||||
|
| unop, expr, expr-cont
|
||||||
|
| string, expr-cont
|
||||||
|
| num, expr-cont
|
||||||
|
| var, expr-cont
|
||||||
|
;
|
||||||
|
(* an optional binary operator or function application *)
|
||||||
|
expr-cont = [ binop, expr | expr ] ;
|
||||||
|
control-vars = [ control-var, [{ ",", control-var }] ] ;
|
||||||
|
control-var = assignment | var ;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Lexemes
|
||||||
|
If you use `{`, `}`, and `;` for `open-block`, `close-block`, and `terminator`,
|
||||||
|
then the lexer is regular. If you use indentation-sensitive syntax, then lexing
|
||||||
|
is context-sensitive.
|
||||||
|
|
||||||
|
```ebnf
|
||||||
|
unop = "-" | "~" | "!" ;
|
||||||
|
(* arithmetic *)
|
||||||
|
binop = "+" | "-" | "*" | "/" | "%"
|
||||||
|
(* bitwise *)
|
||||||
|
| "&" | "|" | "^" | "<<" | ">>" | ">>>"
|
||||||
|
(* logical *)
|
||||||
|
| "=" | ">" | "<" | ">=" | "<=" | "!="
|
||||||
|
(* types *)
|
||||||
|
| ":" | "->"
|
||||||
|
;
|
||||||
|
num = ["-"], { decimal-digit | "," }, ["#", { digit | "," }] ;
|
||||||
|
string = '"', [{ -('"' | newline }], '"' ;
|
||||||
|
label = "'", identifier ;
|
||||||
|
identifier = alpha, [{ alphanumeric }] ;
|
||||||
|
|
||||||
|
alpha = ? 'A'..'Z' | 'a'..'z' ? ;
|
||||||
|
decimal-digit = ? '0'..'9' ? ;
|
||||||
|
alphanumeric = decimal-digit | alpha ;
|
||||||
|
digit = alphanumeric ;
|
||||||
|
newline = "\r" | "\n" ;
|
||||||
|
```
|
||||||
|
|
||||||
|
A number is a series of base 10 digits by default.
|
||||||
|
You may use a different base using the syntax `base#digits`,
|
||||||
|
e.g. `2#100101`, `16#DEADBEEF`.
|
||||||
|
|
||||||
|
## Blocks & Terminators
|
||||||
|
The rules for blocks and terminators.
|
||||||
|
|
||||||
|
1. A terminator is emitted when the indentation level is
|
||||||
|
the same as a previous indentation level in a block:
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius:
|
||||||
|
if x:
|
||||||
|
x = 10
|
||||||
|
print "hello, world!"
|
||||||
|
y = 3
|
||||||
|
```
|
||||||
|
|
||||||
|
2. A block is opened when `:` occurs at the end of a line.
|
||||||
|
|
||||||
|
The indentation level for the block will be the indentation level
|
||||||
|
of the following line.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius
|
||||||
|
loop:
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
3. A block is closed when the indentation level of a line
|
||||||
|
is less than the indentation level of a block.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius
|
||||||
|
loop:
|
||||||
|
...
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
4. If a new indentation level is introduced and it is *not* the start of a block,
|
||||||
|
then it is the continuation of an expression, not a new block.
|
||||||
|
|
||||||
|
All lines and nested indentation levels are *ignored* and no tokens are emitted.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius
|
||||||
|
some_variable = some long expression
|
||||||
|
+ some other long expression
|
||||||
|
```
|
||||||
|
|
||||||
|
## Indentation Levels
|
||||||
|
The rules for indentation levels:
|
||||||
|
|
||||||
|
1. All whitespace on an empty line is ignored.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius
|
||||||
|
...
|
||||||
|
<TAB> <TAB> // very bad indentation, but no error
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
2. All additional indentation on a line is combined into *one* indentation level.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```thadius
|
||||||
|
if x:
|
||||||
|
... // *one* level deeper!
|
||||||
|
```
|
||||||
|
|
||||||
|
3. All tabs on a line must precede all spaces. (One level of mixed indentation is allowed.)
|
||||||
|
|
||||||
|
Good:
|
||||||
|
```thadius
|
||||||
|
<TAB><TAB>if x:
|
||||||
|
<TAB><TAB> ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Good:
|
||||||
|
```thadius
|
||||||
|
loop:
|
||||||
|
<TAB> if x:
|
||||||
|
<TAB> ....
|
||||||
|
```
|
||||||
|
|
||||||
|
Bad:
|
||||||
|
```thadius
|
||||||
|
<TAB> if x:
|
||||||
|
<TAB> <TAB>...
|
||||||
|
```
|
||||||
|
|
||||||
|
4. All indentation must match preceding lines (except for new indentation levels):
|
||||||
|
|
||||||
|
Good:
|
||||||
|
```thadius
|
||||||
|
<TAB>if x:
|
||||||
|
<TAB> ...
|
||||||
|
<TAB> ... // same as previous line
|
||||||
|
<TAB>... // matches indentation of `if x`
|
||||||
|
```
|
||||||
|
|
||||||
|
Bad:
|
||||||
|
```thadius
|
||||||
|
<TAB>... // this line used a tab...
|
||||||
|
... // but this line used a space, even if it looks the same
|
||||||
|
```
|
||||||
|
|
||||||
|
Bad:
|
||||||
|
```thadius:
|
||||||
|
if x:
|
||||||
|
...
|
||||||
|
... // this line doesn't match the level of the previous line *or* the `if`
|
||||||
|
```
|
||||||
|
|
||||||
|
## Operators
|
||||||
|
The full list of operators is specified in "Lexemes".
|
||||||
|
|
||||||
|
Operator precedence:
|
||||||
|
|
||||||
|
* Unary operators always have greatest precedence.
|
||||||
|
* Arithmetic operators: `*` = `/` = `%` > `+` = `-`
|
||||||
|
* Bitwise operators: `&` > (`|` ? `^`)
|
||||||
|
* Logical operators: (`=` ? `!=` ? `>` ? `<` ? `>=` ? `<=`) > all arithmetic or bitwise operators
|
||||||
|
* Type operators: `:` > all other binary operators
|
||||||
|
|
||||||
|
Operator associativity:
|
||||||
|
|
||||||
|
* left-associative: `*`, `/`, `+`, `-`, `&`, `|`, `^`, `:`
|
||||||
|
* right-associative: `->`
|
||||||
|
* non-associative: `=`, `!=`, `>`, `<`, `>=`, `<=`
|
||||||
|
|
||||||
|
If two operators are not related by a precedence (either `?` or not specified),
|
||||||
|
then they cannot be used in the same expression without grouping using parentheses.
|
||||||
|
|
||||||
|
There are no ternary, postfix, or mixfix operators.
|
||||||
|
|
||||||
|
User-defined operators are not allowed (at least not for now).
|
Loading…
Reference in New Issue