Added documentation for the upcoming indentation-sensitive syntax.
parent
bce39fdc22
commit
689ea16f1d
|
@ -5,6 +5,9 @@
|
|||
!/src/**/*.c
|
||||
!/src/**/*.h
|
||||
|
||||
# documentation
|
||||
!/docs/**/*.md
|
||||
|
||||
# top-level configuration
|
||||
!/.editorconfig
|
||||
!/.gitignore
|
||||
|
|
|
@ -0,0 +1,190 @@
|
|||
# Syntax Reference
|
||||
## Grammar
|
||||
The grammar is LL(1).
|
||||
|
||||
```ebnf
|
||||
block = open-block, ("pass" | block-body), close-block
|
||||
block-body = stmt, [{ terminator, stmt }]
|
||||
stmt = assignment | expr
|
||||
assignment = var, [":", expr], "=", expr
|
||||
expr = "if", expr, block, [ "else", block ]
|
||||
| "loop", [ label ], control-vars, block
|
||||
| "next", [ label ]
|
||||
| "exit", [ label ], expr
|
||||
| "return", expr
|
||||
; these expressions can be used as the LHS of
|
||||
; a function application or binary operator.
|
||||
| "(", expr, ")", expr-cont
|
||||
| unop, expr, expr-cont
|
||||
| string, expr-cont
|
||||
| num, expr-cont
|
||||
| var, expr-cont
|
||||
; an optional binary operator or function application
|
||||
expr-cont = [ binop, expr | expr ]
|
||||
control-vars = [ control-var, [{ ",", control-var }] ]
|
||||
control-var = assignment | var
|
||||
```
|
||||
|
||||
## Lexemes
|
||||
If you use `{`, `}`, and `;` for `open-block`, `close-block`, and `terminator`,
|
||||
then the lexer is regular. If you use indentation-sensitive syntax, then lexing
|
||||
is context-sensitive.
|
||||
|
||||
```ebnf
|
||||
unop = "-" | "~" | "!"
|
||||
; arithmetic
|
||||
binop = "+" | "-" | "*" | "/" | "%"
|
||||
; bitwise
|
||||
| "&" | "|" | "^" | "<<" | ">>" | ">>>"
|
||||
; logical
|
||||
| "=" | ">" | "<" | ">=" | "<=" | "!="
|
||||
; types
|
||||
| ":" | "->"
|
||||
num = ["-"], { decimal-digit | "," }, ["#", { digit | "," }]
|
||||
string = '"', [{ !('"' | newline }], '"'
|
||||
label = "'", identifier
|
||||
identifier = alpha, [{ alphanumeric }]
|
||||
|
||||
alpha = 'A'..'Z' | 'a'..'z'
|
||||
decimal-digit = '0'..'9'
|
||||
alphanumeric = decimal-digit | alpha
|
||||
digit = alphanumeric
|
||||
newline = "\r" | "\n"
|
||||
```
|
||||
|
||||
A number is a series of base 10 digits by default.
|
||||
You may use a different base using the syntax `base#digits`,
|
||||
e.g. `2#100101`, `16#DEADBEEF`.
|
||||
|
||||
## Blocks & Terminators
|
||||
The rules for blocks and terminators.
|
||||
|
||||
1. A terminator is emitted when the indentation level is
|
||||
the same as a previous indentation level in a block:
|
||||
|
||||
Example:
|
||||
```thadius:
|
||||
x = 10
|
||||
print "hello, world!"
|
||||
y = 3
|
||||
```
|
||||
|
||||
2. A block is opened when `:` occurs at the end of a line.
|
||||
|
||||
The indentation level for the block will be the indentation level
|
||||
of the following line.
|
||||
|
||||
Example:
|
||||
```thadius
|
||||
loop:
|
||||
pass
|
||||
```
|
||||
|
||||
3. A block is closed when the indentation level of a line
|
||||
is less than the indentation level of a block.
|
||||
|
||||
Example:
|
||||
```thadius
|
||||
loop:
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
4. If a new indentation level is introduced and it is *not* the start of a block,
|
||||
then it is the continuation of an expression, not a new block.
|
||||
|
||||
All lines and nested indentation levels are *ignored* and no tokens are emitted.
|
||||
|
||||
Example:
|
||||
```thadius
|
||||
some_variable = some long expression
|
||||
+ some other long expression
|
||||
```
|
||||
|
||||
## Indentation Levels
|
||||
The rules for indentation levels:
|
||||
|
||||
1. All whitespace on an empty line is ignored.
|
||||
|
||||
Example:
|
||||
```thadius
|
||||
...
|
||||
<TAB> <TAB> // very bad indentation, but no error
|
||||
...
|
||||
```
|
||||
|
||||
2. All additional indentation on a line is combined into *one* indentation level.
|
||||
|
||||
Example:
|
||||
```thadius
|
||||
if x:
|
||||
... // *one* level deeper!
|
||||
```
|
||||
|
||||
3. All tabs on a line must precede all spaces. (One level of mixed indentation is allowed.)
|
||||
|
||||
Good:
|
||||
```thadius
|
||||
<TAB><TAB>if x:
|
||||
<TAB><TAB> ...
|
||||
```
|
||||
|
||||
Good:
|
||||
```thadius
|
||||
loop:
|
||||
<TAB> if x:
|
||||
<TAB> ....
|
||||
```
|
||||
|
||||
Bad:
|
||||
```thadius
|
||||
<TAB> if x:
|
||||
<TAB> <TAB>...
|
||||
```
|
||||
|
||||
4. All indentation must match preceding lines (except for new indentation levels):
|
||||
|
||||
Good:
|
||||
```thadius
|
||||
<TAB>if x:
|
||||
<TAB> ...
|
||||
<TAB> ... // same as previous line
|
||||
<TAB>... // matches indentation of `if x`
|
||||
```
|
||||
|
||||
Bad:
|
||||
```thadius
|
||||
<TAB>... // this line used a tab...
|
||||
... // but this line used a space, even if it looks the same
|
||||
```
|
||||
|
||||
Bad:
|
||||
```thadius:
|
||||
if x:
|
||||
...
|
||||
... // this line doesn't match the level of the previous line *or* the `if`
|
||||
```
|
||||
|
||||
## Operators
|
||||
The full list of operators is specified in "Lexemes".
|
||||
|
||||
Operator precedence:
|
||||
|
||||
* Unary operators always have greatest precedence.
|
||||
* Arithmetic operators: `*` = `/` = `%` > `+` = `-`
|
||||
* Bitwise operators: `&` > (`|` ? `^`)
|
||||
* Logical operators: (`=` ? `!=` ? `>` ? `<` ? `>=` ? `<=`) > all arithmetic or bitwise operators
|
||||
* Type operators: `:` > all other binary operators
|
||||
|
||||
Operator associativity:
|
||||
|
||||
* left-associative: `*`, `/`, `+`, `-`, `&`, `|`, `^`, `:`
|
||||
* right-associative: `->`
|
||||
* non-associative: `=`, `!=`, `>`, `<`, `>=`, `<=`
|
||||
|
||||
If two operators are not related by a precedence (either `?` or not specified),
|
||||
then they cannot be used in the same expression without grouping using parentheses.
|
||||
|
||||
There are no ternary, postfix, or mixfix operators.
|
||||
|
||||
User-defined operators are not allowed (at least not for now).
|
Loading…
Reference in New Issue