diff --git a/.gitignore b/.gitignore index 15f04b0..c0762a0 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,9 @@ !/src/**/*.c !/src/**/*.h +# documentation +!/docs/**/*.md + # top-level configuration !/.editorconfig !/.gitignore diff --git a/docs/syntax.md b/docs/syntax.md new file mode 100644 index 0000000..fee342e --- /dev/null +++ b/docs/syntax.md @@ -0,0 +1,190 @@ +# Syntax Reference +## Grammar +The grammar is LL(1). + +```ebnf +block = open-block, ("pass" | block-body), close-block +block-body = stmt, [{ terminator, stmt }] +stmt = assignment | expr +assignment = var, [":", expr], "=", expr +expr = "if", expr, block, [ "else", block ] + | "loop", [ label ], control-vars, block + | "next", [ label ] + | "exit", [ label ], expr + | "return", expr + ; these expressions can be used as the LHS of + ; a function application or binary operator. + | "(", expr, ")", expr-cont + | unop, expr, expr-cont + | string, expr-cont + | num, expr-cont + | var, expr-cont +; an optional binary operator or function application +expr-cont = [ binop, expr | expr ] +control-vars = [ control-var, [{ ",", control-var }] ] +control-var = assignment | var +``` + +## Lexemes +If you use `{`, `}`, and `;` for `open-block`, `close-block`, and `terminator`, +then the lexer is regular. If you use indentation-sensitive syntax, then lexing +is context-sensitive. + +```ebnf +unop = "-" | "~" | "!" + ; arithmetic +binop = "+" | "-" | "*" | "/" | "%" + ; bitwise + | "&" | "|" | "^" | "<<" | ">>" | ">>>" + ; logical + | "=" | ">" | "<" | ">=" | "<=" | "!=" + ; types + | ":" | "->" +num = ["-"], { decimal-digit | "," }, ["#", { digit | "," }] +string = '"', [{ !('"' | newline }], '"' +label = "'", identifier +identifier = alpha, [{ alphanumeric }] + +alpha = 'A'..'Z' | 'a'..'z' +decimal-digit = '0'..'9' +alphanumeric = decimal-digit | alpha +digit = alphanumeric +newline = "\r" | "\n" +``` + +A number is a series of base 10 digits by default. +You may use a different base using the syntax `base#digits`, +e.g. `2#100101`, `16#DEADBEEF`. + +## Blocks & Terminators +The rules for blocks and terminators. + +1. A terminator is emitted when the indentation level is + the same as a previous indentation level in a block: + + Example: + ```thadius: + x = 10 + print "hello, world!" + y = 3 + ``` + +2. A block is opened when `:` occurs at the end of a line. + + The indentation level for the block will be the indentation level + of the following line. + + Example: + ```thadius + loop: + pass + ``` + +3. A block is closed when the indentation level of a line + is less than the indentation level of a block. + + Example: + ```thadius + loop: + ... + ... + ``` + +4. If a new indentation level is introduced and it is *not* the start of a block, + then it is the continuation of an expression, not a new block. + + All lines and nested indentation levels are *ignored* and no tokens are emitted. + + Example: + ```thadius + some_variable = some long expression + + some other long expression + ``` + +## Indentation Levels +The rules for indentation levels: + +1. All whitespace on an empty line is ignored. + + Example: + ```thadius + ... + // very bad indentation, but no error + ... + ``` + +2. All additional indentation on a line is combined into *one* indentation level. + + Example: + ```thadius + if x: + ... // *one* level deeper! + ``` + +3. All tabs on a line must precede all spaces. (One level of mixed indentation is allowed.) + + Good: + ```thadius + if x: + ... + ``` + + Good: + ```thadius + loop: + if x: + .... + ``` + + Bad: + ```thadius + if x: + ... + ``` + +4. All indentation must match preceding lines (except for new indentation levels): + + Good: + ```thadius + if x: + ... + ... // same as previous line + ... // matches indentation of `if x` + ``` + + Bad: + ```thadius + ... // this line used a tab... + ... // but this line used a space, even if it looks the same + ``` + + Bad: + ```thadius: + if x: + ... + ... // this line doesn't match the level of the previous line *or* the `if` + ``` + +## Operators +The full list of operators is specified in "Lexemes". + +Operator precedence: + +* Unary operators always have greatest precedence. +* Arithmetic operators: `*` = `/` = `%` > `+` = `-` +* Bitwise operators: `&` > (`|` ? `^`) +* Logical operators: (`=` ? `!=` ? `>` ? `<` ? `>=` ? `<=`) > all arithmetic or bitwise operators +* Type operators: `:` > all other binary operators + +Operator associativity: + +* left-associative: `*`, `/`, `+`, `-`, `&`, `|`, `^`, `:` +* right-associative: `->` +* non-associative: `=`, `!=`, `>`, `<`, `>=`, `<=` + +If two operators are not related by a precedence (either `?` or not specified), +then they cannot be used in the same expression without grouping using parentheses. + +There are no ternary, postfix, or mixfix operators. + +User-defined operators are not allowed (at least not for now).