Lexical

The lexical phase splits the input source text into a stream of tokens. This phase finds and rejects illegal characters and malformed tokens (such as a float literal of "4.5x").

MiniD source text consists of white space, end of lines, comments, and tokens, all followed by the end of file marker.

MiniD source text can be in ASCII or any Unicode format (UTF-8, UTF-16, and UTF-32, and both little- and big-endian versions).

Shebang

MiniD source files are allowed to begin their first line with what's called a 'shebang', which is a pound sign immediately followed by an exclamation point: #!. This is commonly used on Posix systems to allow script files to be associated with a host program which runs them. You can use MDCL as the script host for MiniD scripts.

The shebang must be at the very beginning of the file -- the first and second characters (after any BOMs). All text up to and including the end of the shebang line will be ignored. It counts as a line, but is ignored by the compiler as if it were a comment.

Whitespace

WhiteSpace:
	Space {Space}

Space:
	' '
	'\t'
	'\v'
	'\u000C'
	EndOfLine
	Comment
	
EndOfLine:
	'\r'
	'\n'
	'\r\n'
	EndOfFile

Whitespace is generally ignored by MiniD.

End of File

EndOfFile:
	physical end of file
	'\0'

The MiniD lexer will stop lexing when it reaches the actual end of the file, or when it hits a null character.

Comments

Comment:
	'/*' {Character} '*/'
	'//' {Character} EndOfLine
	NestedComment
	
NestedComment:
	'/+' {Character | NestedComment} '+/'

There are three types of comments in MiniD: C-style block comments, C++-style line comments, and D-style nesting comments. All three function the same way as in D. Nesting comments are particularly useful for commenting out blocks of code, where you don't want to have embedded comments affect the commenting. They can be nested arbitrarily deep.

Tokens

Token:
	Identifier
	Keyword
	CharLiteral
	StringLiteral
	IntLiteral
	FloatLiteral
	'+'
	'+='
	'++'
	'-'
	'-='
	'--'
	'~'
	'~='
	'*'
	'*='
	'/'
	'/='
	'%'
	'%='
	'<'
	'<='
	'<<'
	'<<='
	'>'
	'>='
	'>>'
	'>>='
	'>>>'
	'>>>='
	'&'
	'&='
	'&&'
	'|'
	'|='
	'||'
	'^'
	'^='
	'='
	'=='
	'?='
	'.'
	'..'
	'!'
	'!='
	'('
	')'
	'['
	']'
	'{'
	'}'
	':'
	','
	';'
	'#'
	EOF

Identifiers

Identifier:
	IdentifierStart {IdentifierChar}

IdentifierStart:
	_
	Letter

IdentifierChar:
	IdentifierStart
	DecimalDigit

MiniD uses the same identifier rules as D. And just like in D, identifiers starting with two underscores ("__") are reserved and cannot be used. In fact, the lexical pass will fail if it comes across an identifier that starts with two underscores.

Keywords

Keyword:
	'as'
	'break'
	'case'
	'class'
	'catch'
	'continue'
	'coroutine'
	'default'
	'do'
	'else'
	'false'
	'finally'
	'for'
	'foreach'
	'function'
	'global'
	'if'
	'import'
	'in'
	'is'
	'local'
	'module'
	'null'
	'return'
	'super'
	'switch'
	'this'
	'throw'
	'true'
	'try'
	'vararg'
	'while'
	'with'
	'yield'

Many of these keywords are familiar, but there are a few which aren't. as is used to perform a dynamic cast of a class instance type (like cast() in D). local is used to declare functions and variables local to the enclosing function's scope. vararg is used for variadic functions, and is explained in the Functions section. with doesn't have quite the same purpose as in D, and is used in function calls. yield and coroutine are used with coroutines, which are also explained in the functions section of the spec.

Character Literals

CharLiteral:
	"'" (Character | EscapeSequence) "'"

These allow you to specify a single character instead of a whole string. These are treated as their own distinct type in MiniD.

String Literals

StringLiteral:
	RegularString
	WysiwygString
	AltWysiwygString

RegularString:
	'"' {Character | EscapeSequence | EndOfLine} '"'

EscapeSequence:
	'\''
	'\"'
	'\\'
	'\a'
	'\b'
	'\f'
	'\n'
	'\r'
	'\t'
	'\v'
	'\x' HexDigit HexDigit
	'\u' HexDigit HexDigit HexDigit HexDigit
	'\U' HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit
	'\ ' DecimalDigit [DecimalDigit [DecimalDigit]]

WysiwygString:
	'@"' {Character | EndOfLine} '"'

AltWysiwygString:
	'`' {Character | EndOfLine} '`'

Very similar to D. There are two main differences: one, there are no octal escapes; instead, there are decimal escapes. Two, WYSIWYG strings can be enclosed in '@" "' instead of 'r" "'. This makes it easier to parse them, and there are other languages which use such a format for WYSIWYG strings.

Integer Literals

IntLiteral:
	Decimal
	Binary
	Octal
	Hexadecimal

Decimal:
	DecimalDigit {DecimalDigit | '_'}

DecimalDigit:
	'0'
	'1'
	'2'
	'3'
	'4'
	'5'
	'6'
	'7'
	'8'
	'9'

Binary:
	'0' ('b' | 'B') (BinaryDigit | '_') {BinaryDigit | '_'}

BinaryDigit:
	'0'
	'1'

Octal:
	'0' ('c' | 'C') (OctalDigit | '_') {OctalDigit | '_'}

OctalDigit:
	'0'
	'1'
	'2'
	'3'
	'4'
	'5'
	'6'
	'7'

Hexadecimal:
	'0' ('x' | 'X') (HexDigit | '_') {HexDigit | '_'}

HexDigit:
	'0'
	'1'
	'2'
	'3'
	'4'
	'5'
	'6'
	'7'
	'8'
	'9'
	'A'
	'a'
	'B'
	'b'
	'C'
	'c'
	'D'
	'd'
	'E'
	'e'
	'F'
	'f'

Similar to D (including allowing underscores in integer literals), but the main difference from D is the octal integer literals. Instead of starting with just 0, octal literals start with 0c to go along with 0x and 0b. Though to tell you the truth, I've never seen octal used.

Floating-Point Literals

FloatLiteral:
	[DecimalDigit {DecimalDigit | '_'}] '.' (DecimalDigit | '_') {DecimalDigit | '_'} [Exponent]
	DecimalDigit {DecimalDigit | '_'} [Exponent]

Exponent:
	('e' | 'E')['+' | '-'] (DecimalDigit | '_') {DecimalDigit | '_'}

Just like in D, but there are no hex float literals. They wouldn't be too useful in a scripting language. There are also no imaginary numbers.