Rule Predicates and Bindings
(Back to Syntax Guide)
The syntax of the rule predicate sits between the rule name and its definition, prefaced by the '=' operator. This way a rule reads to have two definitions: a predicate and a series of parse expressions. For example, a simple variable predicate in a rule, looks like this:
MyBoundRule = char[] someBinding ::= expression;
To use 'someBinding' within this rule, we use the ':' and ':~' operators to bind the variable name to the result of a particular expression.
# someBinding is assigned the value "helloworld", if this rule is matched. HelloWorld = char[] someBinding ::= "hello":someBinding "world":~someBinding;
The ":" operator above, assigns the result of the terminal "hello" to 'somebinding'. The ":~" operator concatenates the value of "world" to somebinding.
Keep in mind that while Enki does its best to determine what the type of a binding is, you may find that declaring your array types explicitly via the use of '![]' is a good idea. Also, at this point, Enki can only understand single-dimension arrays - sorry, no maps.
Now bindings aren't particularily useful all by themselves. We'd really like to pass them back to a function or class instance. For that, we can use a different kind of rule predicate:
# make a new instance of Foobar FooRule = new Foobar(char[] x) ::= HelloWorld:x;
FooRule is now defined as 'match HelloWorld', and the binding value passed back from the HelloWorld will be placed in 'x', and a new Foobar will be created with that value. Like with the HelloWorld rule above, FooRule will pass back its predicate's value on success which is this case an instance of Foobar. We can also use a function in much the same way:
# call foobarFuncation FooRule2 = uint foobarFuncation(char[] x) ::= HelloWorld:x;
Where foobarFunction() does something with 'x' and returns a uint.
It should be noted that if a bound expression is not a defined rule (such as a terminal or group expression), its type is defined as a string and will be converted back to the binding's type if possible when assigning. If a binding is bound to a rule that has no predicate associated with it, then the type is defined as the text that the rule consumes when parsing.
Also, Enki is fairly intellegent with deducing binding types, so in many cases, you can leave them out for the sake of brevity. The semantic pass on your grammar will deduce binding types based on their use, so that the type will become whatever type is used consistently when assigning to that binding.
# HelloWorld is defined as returning 'char[]' HelloWorld = char[] someBinding ::= "hello":someBinding "world":~someBinding; #x is deduced to type 'char[]' as that's HelloWorld's type FooRule = new Foobar(x) ::= HelloWorld:x; FooRule2 = uint foobarFuncation(x) ::= HelloWorld:x;
Advanced Binding Usage
Bindings can also be used as terminals. By way of the binding substitution operator ("."), the string value of a binding is used and compared against the input. This allows for very powerful idioms to be drafted:
# delim takes on the value of the delimeter
# - it is then used as a repetition terminator!
String ::= ("'"|'"'):delim {any} .delim;
Where using parser feedback to bind data is not enough, Enki has a way to use D identifiers directly for binding assignment as well as terminals. This works best with enums and some lexer designs that run more efficently when tokens are reduced to integer values from their string representation.
The '@' operator is used to provide a way to funnel data into a binding, and does not equate to any parse expression whatsoever. Its counterpart, the '&' operator, turns an identifier into a terminal by way of the parser's 'terminal()' function - if the identifier's type is not a string, a suitable overload will have to be provided in the backend code. Neither expression has any impact on the failure or success of a given rule.
# given that the input is a stream of characters
LexToken
= bool newToken(uint tok)
::= "foo" @TOK_FOO:tok | "bar" @TOK_BAR:tok | "gorf" @TOK_GORF:tok;
# given that the input is now a stream of tokens rather than characters
ParseToken ::= &TOK_FOO | &TOK_BAR | &TOK_GORF;
As the above implies, one can tweak Enki to parse binary data or use a lexed array of structs or even objects. As a frontend technology, with an open backend, it is very agnostic to what data is passed it, provided the logic of the various terminal and parse expressions behave as expected.
The '@' can also be used to call functions defined in D. This is really just an extension of the use without any arguments, so the same rules for calling and binding apply. This uses the '!' operator in much the same way as D's template syntax, which is appropriate seeing as how it has nearly the same meaning. Just place your arguments between '!(' and ')' like so:
Example ::= "x":x "y":y @myfunc!(x,y);
Also, Rules can be passed binding values as they are used. The bindings are passed via D's 'inout' semantic, so the parameters can be modified in the called rule.
# uses x y and z as substed terminals, and assigns 'FOOBAR' to z. ParameterBasedRule(x,y,z) ::= .x .y .z @FOOBAR:z; # call the funky rule above, after parsing a bit. MyRule ::= "one":x "two":y "three":z ParameterBasedRule!(x,y,z);
The actual token used for the rule argument syntax is "!(", which is used to avoid any ambiguity with the negation operator ("!") and the beginning of a group ("(").
