Lexical

The lexical phase splits the input source text into a stream of tokens. This phase finds and rejects illegal characters and malformed tokens (such as a float literal of "4.5x").

MiniD source text consists of white space, end of lines, comments, and tokens, all followed by the end of file marker.

MiniD source text can be in ASCII (not extended ASCII, normal 7-bit ASCII) or any Unicode format (UTF-8, UTF-16, and UTF-32, and both little- and big-endian versions).

Shebang

MiniD source files are allowed to begin their first line with what's called a 'shebang', which is a pound sign immediately followed by an exclamation point: #!. This is commonly used on Posix systems to allow script files to be associated with a host program which runs them. You can use MDCL as the script host for MiniD scripts.

The shebang must be at the very beginning of the file -- the first and second characters (after any BOMs). All text up to and including the end of the shebang line will be ignored. It counts as a line, but is ignored by the compiler as if it were a comment.

Whitespace

WhiteSpace:
	Space {Space}

Space:
	' '
	'\t'
	'\v'
	'\u000C'
	EndOfLine
	Comment
	
EndOfLine:
	'\r'
	'\n'
	'\r\n'
	EndOfFile

Whitespace is generally ignored by MiniD. There is one exception. The EndOfLine element is one of the possible statement terminators (see the Statements section for information). However, an EndOfLine is not always interpreted as such, and may be ignored, such as if it comes in the middle of an expression or statement.

End of File

EndOfFile:
	physical end of file
	'\0'

The MiniD lexer will stop lexing when it reaches the actual end of the file, or when it hits a null character.

Comments

Comment:
	'/*' {Character} '*/'
	'//' {Character} EndOfLine
	NestedComment
	
NestedComment:
	'/+' {Character | NestedComment} '+/'

There are three types of comments in MiniD: C-style block comments, C++-style line comments, and D-style nesting comments. All three function the same way as in D. Nesting comments are particularly useful for commenting out blocks of code, where you don't want to have embedded comments affect the commenting. They can be nested arbitrarily deep.

Tokens

Token:
	Identifier
	Keyword
	CharLiteral
	StringLiteral
	IntLiteral
	FloatLiteral
	'+'
	'+='
	'++'
	'-'
	'-='
	'--'
	'~'
	'~='
	'*'
	'*='
	'/'
	'/='
	'%'
	'%='
	'<'
	'<='
	'<<'
	'<<='
	'>'
	'>='
	'>>'
	'>>='
	'>>>'
	'>>>='
	'&'
	'&='
	'&&'
	'|'
	'|='
	'||'
	'^'
	'^='
	'='
	'=='
	'?='
	'.'
	'..'
	'!'
	'!='
	'('
	')'
	'['
	']'
	'{'
	'}'
	':'
	','
	';'
	'#'
	'\\'
	'->'
	'$'
	EOF

Identifiers

Identifier:
	IdentifierStart {IdentifierChar}

IdentifierStart:
	_
	Letter

IdentifierChar:
	IdentifierStart
	DecimalDigit

Identifiers starting with two underscores ("__") are reserved and cannot be used. In fact, the lexical pass will fail if it comes across an identifier that starts with two underscores.

Keywords

Keyword:
	'as'
	'assert'
	'break'
	'case'
	'catch'
	'class'
	'continue'
	'coroutine'
	'default'
	'do'
	'else'
	'false'
	'finally'
	'for'
	'foreach'
	'function'
	'global'
	'if'
	'import'
	'in'
	'is'
	'local'
	'module'
	'namespace'
	'null'
	'return'
	'scope'
	'super'
	'switch'
	'this'
	'throw'
	'true'
	'try'
	'vararg'
	'while'
	'with'
	'yield'

Character Literals

CharLiteral:
	"'" (Character | EscapeSequence) "'"

These allow you to specify a single character instead of a whole string. These are treated as their own distinct type in MiniD.

String Literals

StringLiteral:
	RegularString
	WysiwygString
	AltWysiwygString

RegularString:
	'"' {Character | EscapeSequence | EndOfLine} '"'

EscapeSequence:
	'\''
	'\"'
	'\\'
	'\a'
	'\b'
	'\f'
	'\n'
	'\r'
	'\t'
	'\v'
	'\x' HexDigit HexDigit
	'\u' HexDigit HexDigit HexDigit HexDigit
	'\U' HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit
	'\ ' DecimalDigit [DecimalDigit [DecimalDigit]]

WysiwygString:
	'@"' {Character | EndOfLine | '""'} '"'

AltWysiwygString:
	'`' {Character | EndOfLine | '``'} '`'

WYSIWYG string literals are allowed to contain doubled-up versions of their open and close quotes, in order to embed those characters within the string. For example:

@"He said, ""come here!""" // contains "He said, \"come here!\""
`This is what's known as ``something'.` // contains "This is what's known as `something\'."

Integer Literals

IntLiteral:
	Decimal
	Binary
	Hexadecimal

Decimal:
	DecimalDigit {DecimalDigit | '_'}

DecimalDigit:
	'0' .. '9'

Binary:
	'0' ('b' | 'B') (BinaryDigit | '_') {BinaryDigit | '_'}

BinaryDigit:
	'0'
	'1'

Hexadecimal:
	'0' ('x' | 'X') (HexDigit | '_') {HexDigit | '_'}

HexDigit:
	'0' .. '9'
	'a' .. 'f'
	'A' .. 'F'

Similar to most other C-style languages. There are no octal literals. Who uses octal, anyway?

Floating-Point Literals

FloatLiteral:
	[DecimalDigit {DecimalDigit | '_'}] '.' (DecimalDigit | '_') {DecimalDigit | '_'} [Exponent]
	DecimalDigit {DecimalDigit | '_'} [Exponent]

Exponent:
	('e' | 'E')['+' | '-'] (DecimalDigit | '_') {DecimalDigit | '_'}

Very similar to most other C-style languages.