root/trunk/docsrc/lex.dd

Revision 2150, 24.3 kB (checked in by walter, 2 years ago)

bugzilla 2734 Ambiguity in tokenizing: _._ as a float literal

  • Property svn:eol-style set to native
Line 
1 Ddoc
2
3 $(SPEC_S Lexical,
4
5     In D, the lexical analysis is independent of the syntax parsing and the
6     semantic analysis. The lexical analyzer splits the source text up into
7     tokens. The lexical grammar describes what those tokens are. The D
8     lexical grammar is designed to be suitable for high speed scanning, it
9     has a minimum of special case rules, there is only one phase of
10     translation, and to make it easy to write a correct scanner
11     for. The tokens are readily recognizable by those familiar with C and
12     C++.
13
14 <h3>Phases of Compilation</h3>
15
16     The process of compiling is divided into multiple phases. Each phase
17     has no dependence on subsequent phases. For example, the scanner is
18     not perturbed by the semantic analyzer. This separation of the passes
19     makes language tools like syntax
20     directed editors relatively easy to produce.
21     It also is possible to compress D source by storing it in
22     $(SINGLEQUOTE tokenized) form.
23
24 $(OL
25     $(LI $(B source character set)$(BR)
26
27     The source file is checked to see what character set it is,
28     and the appropriate scanner is loaded. ASCII and UTF
29     formats are accepted.
30     )
31
32     $(LI $(B script line) $(BR)
33
34     If the first line starts with $(GREEN #!) then the first line
35     is ignored.
36     )
37
38     $(LI $(B lexical analysis)$(BR)
39
40     The source file is divided up into a sequence of tokens.
41     $(LINK2 #specialtokens, Special tokens) are replaced with other tokens.
42     $(LINK2 #specialtokenseq, Special token sequences)
43     are processed and removed.
44     )
45
46     $(LI $(B syntax analysis)$(BR)
47
48     The sequence of tokens is parsed to form syntax trees.
49     )
50
51     $(LI $(B semantic analysis)$(BR)
52
53     The syntax trees are traversed to declare variables, load symbol tables, assign
54     types, and in general determine the meaning of the program.
55     )
56
57     $(LI $(B optimization)$(BR)
58
59     Optimization is an optional pass that tries to rewrite the program
60     in a semantically equivalent, but faster executing, version.
61     )
62
63     $(LI $(B code generation)$(BR)
64
65     Instructions are selected from the target architecture to implement
66     the semantics of the program. The typical result will be
67     an object file, suitable for input to a linker.
68     )
69 )
70
71
72 <h3>Source Text</h3>
73
74     D source text can be in one of the following formats:
75
76     $(UL
77     $(LI ASCII)
78     $(LI UTF-8)
79     $(LI UTF-16BE)
80     $(LI UTF-16LE)
81     $(LI UTF-32BE)
82     $(LI UTF-32LE)
83     )
84
85     UTF-8 is a superset of traditional 7-bit ASCII.
86     One of the
87     following UTF BOMs (Byte Order Marks) can be present at the beginning
88     of the source text:
89     <p>
90
91     $(TABLE2 UTF Byte Order Marks,
92     $(TR
93     $(TH Format)
94     $(TH BOM)
95     )
96     $(TR
97     $(TD UTF-8)
98     $(TD EF BB BF)
99     )
100     $(TR
101     $(TD UTF-16BE)
102     $(TD FE FF)
103     )
104     $(TR
105     $(TD UTF-16LE)
106     $(TD FF FE)
107     )
108     $(TR
109     $(TD UTF-32BE)
110     $(TD 00 00 FE FF)
111     )
112     $(TR
113     $(TD UTF-32LE)
114     $(TD FF FE 00 00)
115     )
116     $(TR
117     $(TD ASCII)
118     $(TD no BOM)
119     )
120     )
121
122     $(P If the source file does not start with a BOM, then the first
123     character must be less than or equal to U0000007F.)
124
125     $(P There are no digraphs or trigraphs in D.)
126
127     $(P The source text is decoded from its source representation
128     into Unicode $(I Character)s.
129     The $(I Character)s are further divided into:
130
131     $(LINK2 #whitespace, white space),
132     $(LINK2 #endofline, end of lines),
133     $(LINK2 #comment, comments),
134     $(LINK2 #specialtokens, special token sequences),
135     $(LINK2 #tokens, tokens),
136     all followed by $(LINK2 #eof, end of file).
137     )
138
139     $(P The source text is split into tokens using the maximal munch
140     technique, i.e., the
141     lexical analyzer tries to make the longest token it can. For example
142     <code>&gt;&gt;</code> is a right shift token,
143     not two greater than tokens. An exception to this rule is that a ..
144     embedded inside what looks like two floating point literals, as in
145     1..2, is interpreted as if the .. was separated by a space from the
146     first integer.
147     )
148
149 <h3>$(LNAME2 eof, End of File)</h3>
150
151 $(GRAMMAR
152 $(I EndOfFile):
153     $(I physical end of the file)
154     \u0000
155     \u001A
156 )
157
158     The source text is terminated by whichever comes first.
159
160 <h3>$(LNAME2 endofline, End of Line)</h3>
161
162 $(GRAMMAR
163 $(I EndOfLine):
164     \u000D
165     \u000A
166     \u000D \u000A
167     $(I EndOfFile)
168 )
169
170     There is no backslash line splicing, nor are there any limits
171     on the length of a line.
172
173 <h3>$(LNAME2 whitespace, White Space)</h3>
174
175 $(GRAMMAR
176 $(I WhiteSpace):
177     $(I Space)
178     $(I Space) $(I WhiteSpace)
179
180 $(I Space):
181     \u0020
182     \u0009
183     \u000B
184     \u000C
185 )
186
187
188 <h3>$(LNAME2 comment, Comments)</h3>
189
190 $(GRAMMAR
191 $(I Comment):
192     $(B /*) $(I Characters) $(B */)
193     $(B //) $(I Characters) $(I EndOfLine)
194     $(I NestingBlockComment)
195
196 $(I Characters):
197     $(I Character)
198     $(I Character) $(I Characters)
199
200 $(I NestingBlockComment):
201     $(B /+) $(I NestingBlockCommentCharacters) $(B +/)
202
203 $(I NestingBlockCommentCharacters):
204     $(I NestingBlockCommentCharacter)
205     $(I NestingBlockCommentCharacter) $(I NestingBlockCommentCharacters)
206
207 $(I NestingBlockCommentCharacter):
208     $(I Character)
209     $(I NestingBlockComment)
210 )
211
212     D has three kinds of comments:
213     $(OL
214     $(LI Block comments can span multiple lines, but do not nest.)
215     $(LI Line comments terminate at the end of the line.)
216     $(LI Nesting comments can span multiple lines and can nest.)
217     )
218
219     $(P
220     The contents of strings and comments are not tokenized.  Consequently,
221     comment openings occurring within a string do not begin a comment, and
222     string delimiters within a comment do not affect the recognition of
223     comment closings and nested "/+" comment openings.  With the exception
224     of "/+" occurring within a "/+" comment, comment openings within a
225     comment are ignored.
226     )
227
228 -------------
229 a = /+ // +/ 1;     // parses as if 'a = 1;'
230 a = /+ "+/" +/ 1";  // parses as if 'a = " +/ 1";'
231 a = /+ /* +/ */ 3;  // parses as if 'a = */ 3;'
232 -------------
233
234     Comments cannot be used as token concatenators, for example,
235     <code>abc/**/def</code> is two tokens, $(TT abc) and $(TT def),
236     not one $(TT abcdef) token.
237
238 <h3>$(LNAME2 tokens, Tokens)</h3>
239
240 $(GRAMMAR
241 $(I Token):
242     $(LINK2 #identifier, $(I Identifier))
243     $(LINK2 #StringLiteral, $(I StringLiteral))
244     $(LINK2 #characterliteral, $(I CharacterLiteral))
245     $(LINK2 #integerliteral, $(I IntegerLiteral))
246     $(LINK2 #floatliteral, $(I FloatLiteral))
247     $(LINK2 #keyword, $(I Keyword))
248     $(B /)
249     $(B /=)
250     $(B .)
251     $(B ..)
252     $(B ...)
253     $(B &)
254     $(B &=)
255     $(B &&)
256     $(B |)
257     $(B |=)
258     $(B ||)
259     $(B -)
260     $(B -=)
261     $(B --)
262     $(B +)
263     $(B +=)
264     $(B ++)
265     $(B &lt;)
266     $(B &lt;=)
267     $(B &lt;&lt;)
268     $(B &lt;&lt;=)
269     $(B &lt;&gt;)
270     $(B &lt;&gt=)
271     $(B &gt;)
272     $(B &gt;=)
273     $(B &gt;&gt;=)
274     $(B &gt;&gt;&gt;=)
275     $(B &gt;&gt;)
276     $(B &gt;&gt;&gt;)
277     $(B !)
278     $(B !=)
279     $(B !&lt;&gt;)
280     $(B !&lt;&gt;=)
281     $(B !&lt;)
282     $(B !&lt;=)
283     $(B !&gt;)
284     $(B !&gt;=)
285     $(B $(LPAREN))
286     $(B $(RPAREN))
287     $(B [)
288     $(B ])
289     $(B {)
290     $(B })
291     $(B ?)
292     $(B ,)
293     $(B ;)
294     $(B :)
295     $(B $)
296     $(B =)
297     $(B ==)
298     $(B *)
299     $(B *=)
300     $(B %)
301     $(B %=)
302     $(B ^)
303     $(B ^=)
304     $(B ~)
305     $(B ~=)
306     $(V2 $(B @))
307 )
308
309 <h3>$(LNAME2 identifier, Identifiers)</h3>
310
311 $(GRAMMAR
312 $(I Identifier):
313     $(I IdentiferStart)
314     $(I IdentiferStart) $(I IdentifierChars)
315
316 $(I IdentifierChars):
317     $(I IdentiferChar)
318     $(I IdentiferChar) $(I IdentifierChars)
319
320 $(I IdentifierStart):
321     $(B _)
322     $(I Letter)
323     $(I UniversalAlpha)
324
325 $(I IdentifierChar):
326     $(I IdentiferStart)
327     $(B 0)
328     $(I NonZeroDigit)
329 )
330
331
332     Identifiers start with a letter, $(B _), or universal alpha,
333     and are followed by any number
334     of letters, $(B _), digits, or universal alphas.
335     Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D.
336     (This is the C99 Standard.)
337     Identifiers can be arbitrarily long, and are case sensitive.
338     Identifiers starting with $(B __) (two underscores) are reserved.
339
340 <h3>$(LNAME2 StringLiteral, String Literals)</h3>
341
342 $(GRAMMAR
343 $(I StringLiteral):
344     $(I WysiwygString)
345     $(I AlternateWysiwygString)
346     $(I DoubleQuotedString)
347 $(V1
348     $(I EscapeSequence))
349     $(I HexString)
350 $(V2
351     $(I DelimitedString)
352     $(I TokenString))
353
354 $(I WysiwygString):
355     $(B r") $(I WysiwygCharacters) $(B ") $(I Postfix<sub>opt</sub>)
356
357 $(I AlternateWysiwygString):
358     $(B `) $(I WysiwygCharacters) $(B `) $(I Postfix<sub>opt</sub>)
359
360 $(I WysiwygCharacters):
361     $(I WysiwygCharacter)
362     $(I WysiwygCharacter) $(I WysiwygCharacters)
363
364 $(I WysiwygCharacter):
365     $(I Character)
366     $(I EndOfLine)
367
368 $(I DoubleQuotedString):
369     $(B ") $(I DoubleQuotedCharacters) $(B ") $(I Postfix<sub>opt</sub>)
370
371 $(I DoubleQuotedCharacters):
372     $(I DoubleQuotedCharacter)
373     $(I DoubleQuotedCharacter) $(I DoubleQuotedCharacters)
374
375 $(I DoubleQuotedCharacter):
376     $(I Character)
377     $(I EscapeSequence)
378     $(I EndOfLine)
379
380 $(LNAME2 EscapeSequence, $(I EscapeSequence)):
381     $(B \')
382     $(B \")
383     $(B \?)
384     $(B \\)
385     $(B \a)
386     $(B \b)
387     $(B \f)
388     $(B \n)
389     $(B \r)
390     $(B \t)
391     $(B \v)
392     $(B \) $(I EndOfFile)
393     $(B \x) $(I HexDigit) $(I HexDigit)
394     $(B \) $(I OctalDigit)
395     $(B \) $(I OctalDigit) $(I OctalDigit)
396     $(B \) $(I OctalDigit) $(I OctalDigit) $(I OctalDigit)
397     $(B \u) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit)
398     $(B \U) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit) $(I HexDigit)
399     $(B \&amp;) $(LINK2 entity.html, $(I NamedCharacterEntity)) $(B ;)
400
401 $(I HexString):
402     $(B x") $(I HexStringChars) $(B ") $(I Postfix<sub>opt</sub>)
403
404 $(I HexStringChars):
405     $(I HexStringChar)
406     $(I HexStringChar) $(I HexStringChars)
407
408 $(I HexStringChar):
409     $(I HexDigit)
410     $(I WhiteSpace)
411     $(I EndOfLine)
412
413 $(I Postfix):
414     $(B c)
415     $(B w)
416     $(B d)
417
418 $(V2
419 $(I DelimitedString):
420     $(B q") $(I Delimiter) $(I WysiwygCharacters) $(I MatchingDelimiter) $(B ")
421
422 $(I TokenString):
423     $(B q{) $(I Tokens) $(B })
424 )
425 )
426
427     $(P
428     A string literal is either a double quoted string, a wysiwyg quoted
429     string, an escape sequence,
430     $(V2 a delimited string, a token string,)
431     or a hex string.
432     )
433
434 <h4>Wysiwyg Strings</h4>
435
436     $(P
437     Wysiwyg quoted strings are enclosed by r" and ".
438     All characters between
439     the r" and " are part of the string except for $(I EndOfLine) which is
440     regarded as a single \n character.
441     There are no escape sequences inside r" ":
442     )
443
444 ---------------
445 r"hello"
446 r"c:\root\foo.exe"
447 r"ab\n"         // string is 4 characters, 'a', 'b', '\', 'n'
448 ---------------
449
450     $(P
451     An alternate form of wysiwyg strings are enclosed by backquotes,
452     the ` character. The ` character is not available on some keyboards
453     and the font rendering of it is sometimes indistinguishable from
454     the regular ' character. Since, however, the ` is rarely used,
455     it is useful to delineate strings with " in them.
456     )
457
458 ---------------
459 `hello`
460 `c:\root\foo.exe`
461 `ab\n`          // string is 4 characters, 'a', 'b', '\', 'n'
462 ---------------
463
464 <h4>Double Quoted Strings</h4>
465
466     Double quoted strings are enclosed by "". Escape sequences can be
467     embedded into them with the typical \ notation.
468     $(I EndOfLine) is regarded as a single \n character.
469
470 ---------------
471 "hello"
472 "c:\\root\\foo.exe"
473 "ab\n"          // string is 3 characters, 'a', 'b', and a linefeed
474 "ab
475 "           // string is 3 characters, 'a', 'b', and a linefeed
476 ---------------
477
478 $(V1
479 <h4>Escape Strings</h4>
480
481     $(P Escape strings start with a \ and form an escape character sequence.
482     Adjacent escape strings are concatenated:
483     )
484
485 <pre>
486 \n          the linefeed character
487 \t          the tab character
488 \"          the double quote character
489 \012            octal
490 \x1A            hex
491 \u1234          wchar character
492 \U00101234      dchar character
493 \&amp;reg;          &reg; dchar character
494 \r\n            carriage return, line feed
495 </pre>
496
497     $(P Undefined escape sequences are errors.
498     Although string literals are defined to be composed of
499     UTF characters, the octal and hex escape sequences allow
500     the insertion of arbitrary binary data.
501     \u and \U escape sequences can only be used to insert
502     valid UTF characters.
503     )
504 )
505
506 <h4>Hex Strings</h4>
507
508     $(P Hex strings allow string literals to be created using hex data.
509     The hex data need not form valid UTF characters.
510     )
511
512 --------------
513 x"0A"           // same as "\x0A"
514 x"00 FBCD 32FD 0A"  // same as "\x00\xFB\xCD\x32\xFD\x0A"
515 --------------
516
517     Whitespace and newlines are ignored, so the hex data can be
518     easily formatted.
519     The number of hex characters must be a multiple of 2.
520     <p>
521
522     Adjacent strings are concatenated with the ~ operator, or by simple
523     juxtaposition:
524
525 --------------
526 "hello " ~ "world" ~ \n // forms the string 'h','e','l','l','o',' ',
527             // 'w','o','r','l','d',linefeed
528 --------------
529
530     The following are all equivalent:
531
532 -----------------
533 "ab" "c"
534 r"ab" r"c"
535 r"a" "bc"
536 "a" ~ "b" ~ "c"
537 \x61"bc"
538 -----------------
539
540     The optional $(I Postfix) character gives a specific type
541     to the string, rather than it being inferred from the context.
542     This is useful when the type cannot be unambiguously inferred,
543     such as when overloading based on string type. The types corresponding
544     to the postfix characters are:
545     <p>
546
547     $(TABLE2 String Literal Postfix Characters,
548     $(TR
549     $(TH Postfix)
550     $(TH Type)
551     )
552     $(TR
553     $(TD $(B c))
554     $(TD char[ ])
555     )
556     $(TR
557     $(TD $(B w))
558     $(TD wchar[ ])
559     )
560     $(TR
561     $(TD $(B d))
562     $(TD dchar[ ])
563     )
564     )
565
566 ---
567 "hello"c          // char[]
568 "hello"w          // wchar[]
569 "hello"d          // dchar[]
570 ---
571
572     $(P String literals are read only. Writes to string literals
573     cannot always be detected, but cause undefined behavior.)
574
575 $(V2
576 <h4>Delimited Strings</h4>
577
578     $(P Delimited strings use various forms of delimiters.
579     The delimiter, whether a character or identifer,
580     must immediately follow the " without any intervening whitespace.
581     The terminating delimiter must immediately precede the closing "
582     without any intervening whitespace.
583     A $(I nesting delimiter) nests, and is one of the
584     following characters:
585     )
586
587     $(TABLE2 Nesting Delimiters,
588     $(TR
589     $(TH Delimiter)
590     $(TH Matching Delimiter)
591     )
592     $(TR
593     $(TD [)
594     $(TD ])
595     )
596     $(TR
597     $(TD $(LPAREN))
598     $(TD $(RPAREN))
599     )
600     $(TR
601     $(TD &lt;)
602     $(TD &gt;)
603     )
604     $(TR
605     $(TD {)
606     $(TD })
607     )
608     )
609
610 ---
611 q"(foo(xxx))"   // "foo(xxx)"
612 q"[foo{]"       // "foo{"
613 ---
614
615     $(P If the delimiter is an identifier, the identifier must
616     be immediately followed by a newline, and the matching
617     delimiter is the same identifier starting at the beginning
618     of the line:
619     )
620 ---
621 writefln(q"EOS
622 This
623 is a multi-line
624 heredoc string
625 EOS"
626 );
627 ---
628     $(P The newline following the opening identifier is not part
629     of the string, but the last newline before the closing
630     identifier is part of the string.
631     )
632
633     $(P Otherwise, the matching delimiter is the same as
634     the delimiter character:)
635
636 ---
637 q"/foo]/"       // "foo]"
638 q"/abc/def/"    // error
639 ---
640
641 <h4>Token Strings</h4>
642
643     $(P Token strings open with the characters $(B q{) and close with
644     the token $(B }). In between must be valid D tokens.
645     The $(B {) and $(B }) tokens nest.
646     The string is formed of all the characters between the opening
647     and closing of the token string, including comments.
648     )
649
650 ---
651 q{foo}               // "foo"
652 q{/*}*/ }            // "/*}*/ "
653 q{ foo(q{hello}); }  // " foo(q{hello}); "
654 q{ @ }               // error, @ is not a valid D token
655 q{ __TIME__ }        // " __TIME__ ", i.e. it is not replaced with the time
656 q{ __EOF__ }         // error, as __EOF__ is not a token, it's end of file
657 ---
658
659 )
660
661 <h3>$(LNAME2 characterliteral, Character Literals)</h3>
662
663 $(GRAMMAR
664 $(I CharacterLiteral):
665     $(B ') $(I SingleQuotedCharacter) $(B ')
666
667 $(I SingleQuotedCharacter):
668     $(I Character)
669     $(I EscapeSequence)
670 )
671
672     Character literals are a single character or escape sequence
673     enclosed by single quotes, ' '.
674
675 <h3>$(LNAME2 integerliteral, Integer Literals)</h3>
676
677 $(GRAMMAR
678 $(GNAME IntegerLiteral):
679     $(I Integer)
680     $(I Integer) $(I IntegerSuffix)
681
682 $(I Integer):
683     $(I Decimal)
684     $(I Binary)
685     $(I Octal)
686     $(I Hexadecimal)
687
688 $(I IntegerSuffix):
689     $(B L)
690     $(B u)
691     $(B U)
692     $(B Lu)
693     $(B LU)
694     $(B uL)
695     $(B UL)
696
697 $(GNAME Decimal):
698     $(B 0)
699     $(I NonZeroDigit)
700     $(I NonZeroDigit) $(I DecimalDigits)
701
702 $(I Binary):
703     $(B 0b) $(I BinaryDigits)
704     $(B 0B) $(I BinaryDigits)
705
706 $(I Octal):
707     $(B 0) $(I OctalDigits)
708
709 $(I Hexadecimal):
710     $(B 0x) $(I HexDigits)
711     $(B 0X) $(I HexDigits)
712
713 $(I NonZeroDigit):
714     $(B 1)
715     $(B 2)
716     $(B 3)
717     $(B 4)
718     $(B 5)
719     $(B 6)
720     $(B 7)
721     $(B 8)
722     $(B 9)
723
724 $(GNAME DecimalDigits):
725     $(I DecimalDigit)
726     $(I DecimalDigit) $(I DecimalDigits)
727
728 $(GNAME DecimalDigit):
729     $(B 0)
730     $(I NonZeroDigit)
731     $(B _)
732
733 $(I BinaryDigits):
734     $(I BinaryDigit)
735     $(I BinaryDigit) $(I BinaryDigits)
736
737 $(I BinaryDigit):
738     $(B 0)
739     $(B 1)
740     $(B _)
741
742 $(I OctalDigits):
743     $(I OctalDigit)
744     $(I OctalDigit) $(I OctalDigits)
745
746 $(I OctalDigit):
747     $(B 0)
748     $(B 1)
749     $(B 2)
750     $(B 3)
751     $(B 4)
752     $(B 5)
753     $(B 6)
754     $(B 7)
755     $(B _)
756
757 $(I HexDigits):
758     $(I HexDigit)
759     $(I HexDigit) $(I HexDigits)
760
761 $(I HexDigit):
762     $(I DecimalDigit)
763     $(B a)
764     $(B b)
765     $(B c)
766     $(B d)
767     $(B e)
768     $(B f)
769     $(B A)
770     $(B B)
771     $(B C)
772     $(B D)
773     $(B E)
774     $(B F)
775     $(B _)
776 )
777
778     Integers can be specified in decimal, binary, octal, or hexadecimal.
779 <p>
780     Decimal integers are a sequence of decimal digits.
781 <p>
782     $(LNAME2 binary-literals, Binary integers) are a sequence of binary digits preceded
783     by a $(SINGLEQUOTE 0b).
784 <p>
785     Octal integers are a sequence of octal digits preceded by a $(SINGLEQUOTE 0).
786 <p>
787     Hexadecimal integers are a sequence of hexadecimal digits preceded
788     by a $(SINGLEQUOTE 0x).
789 <p>
790     Integers can have embedded $(SINGLEQUOTE _) characters, which are ignored.
791     The embedded $(SINGLEQUOTE _) are useful for formatting long literals, such
792     as using them as a thousands separator:
793
794 -------------
795 123_456     // 123456
796 1_2_3_4_5_6_    // 123456
797 -------------
798
799     Integers can be immediately followed by one $(SINGLEQUOTE L) or one
800     $(SINGLEQUOTE u) or both.
801 <p>
802     The type of the integer is resolved as follows:
803     <p>
804
805     $(TABLE2 Decimal Literal Types,
806     $(TR
807     $(TH Decimal Literal)
808     $(TH Type)
809     )
810     $(TR
811     $(TD 0 .. 2_147_483_647)
812     $(TD int)
813     )
814     $(TR
815     $(TD 2_147_483_648 .. 9_223_372_036_854_775_807L)
816     $(TD long)
817     )
818     $(TR
819     $(TH Decimal Literal, L Suffix)
820     $(TH Type)
821     )
822     $(TR
823     $(TD 0L .. 9_223_372_036_854_775_807L)
824     $(TD long)
825     )
826     $(TR
827     $(TH Decimal Literal, U Suffix)
828     $(TH Type)
829     )
830     $(TR
831     $(TD 0U .. 4_294_967_296U)
832     $(TD uint)
833     )
834     $(TR
835     $(TD 4_294_967_296U .. 18_446_744_073_709_551_615UL)
836     $(TD ulong)
837     )
838     $(TR
839     $(TH Decimal Literal, UL Suffix)
840     $(TH Type)
841     )
842     $(TR
843     $(TD 0UL .. 18_446_744_073_709_551_615UL)
844     $(TD ulong)
845     )
846
847     $(TR
848     $(TH Non-Decimal Literal)
849     $(TH Type)
850     )
851     $(TR
852     $(TD 0x0 .. 0x7FFF_FFFF)
853     $(TD int)
854     )
855     $(TR
856     $(TD 0x8000_0000 .. 0xFFFF_FFFF)
857     $(TD uint)
858     )
859     $(TR
860     $(TD 0x1_0000_0000 .. 0x7FFF_FFFF_FFFF_FFFF)
861     $(TD long)
862     )
863     $(TR
864     $(TD 0x8000_0000_0000_0000 .. 0xFFFF_FFFF_FFFF_FFFF)
865     $(TD ulong)
866     )
867     $(TR
868     $(TH Non-Decimal Literal, L Suffix)
869     $(TH Type)
870     )
871     $(TR
872     $(TD 0x0L .. 0x7FFF_FFFF_FFFF_FFFFL)
873     $(TD long)
874     )
875     $(TR
876     $(TD 0x8000_0000_0000_0000L .. 0xFFFF_FFFF_FFFF_FFFFL)
877     $(TD ulong)
878     )
879     $(TR
880     $(TH Non-Decimal Literal, U Suffix)
881     $(TH Type)
882     )
883     $(TR
884     $(TD 0x0U .. 0xFFFF_FFFFU)
885     $(TD uint)
886     )
887     $(TR
888     $(TD 0x1_0000_0000UL .. 0xFFFF_FFFF_FFFF_FFFFUL)
889     $(TD ulong)
890     )
891     $(TR
892     $(TH Non-Decimal Literal, UL Suffix)
893     $(TH Type)
894     )
895     $(TR
896     $(TD 0x0UL .. 0xFFFF_FFFF_FFFF_FFFFUL)
897     $(TD ulong)
898     )
899
900     )
901
902
903 <h3>$(LNAME2 floatliteral, Floating Literals)</h3>
904
905 $(GRAMMAR
906 $(GNAME FloatLiteral):
907     $(I Float)
908     $(I Float) $(I Suffix)
909     $(I Integer) $(I ImaginarySuffix)
910     $(I Integer) $(I FloatSuffix) $(I ImaginarySuffix)
911     $(I Integer) $(I RealSuffix) $(I ImaginarySuffix)
912
913 $(I Float):
914     $(I DecimalFloat)
915     $(I HexFloat)
916
917 $(I DecimalFloat):
918     $(GLINK LeadingDecimal) $(B .)
919     $(GLINK LeadingDecimal) $(B .) $(I DecimalDigits)
920     $(I DecimalDigits) $(B .) $(I DecimalDigits) $(I DecimalExponent)
921     $(B .) $(I Decimal)
922     $(B .) $(I Decimal) $(I DecimalExponent)
923     $(GLINK LeadingDecimal) $(I DecimalExponent)
924
925 $(I DecimalExponent)
926     $(B e) $(I DecimalDigits)
927     $(B E) $(I DecimalDigits)
928     $(B e+) $(I DecimalDigits)
929     $(B E+) $(I DecimalDigits)
930     $(B e-) $(I DecimalDigits)
931     $(B E-) $(I DecimalDigits)
932
933 $(I HexFloat):
934     $(I HexPrefix) $(I HexDigits) $(B .) $(I HexDigits) $(I HexExponent)
935     $(I HexPrefix) $(B .) $(I HexDigits) $(I HexExponent)
936     $(I HexPrefix) $(I HexDigits) $(I HexExponent)
937
938 $(I HexPrefix):
939     $(B 0x)
940     $(B 0X)
941
942 $(I HexExponent):
943     $(B p) $(I DecimalDigits)
944     $(B P) $(I DecimalDigits)
945     $(B p+) $(I DecimalDigits)
946     $(B P+) $(I DecimalDigits)
947     $(B p-) $(I DecimalDigits)
948     $(B P-) $(I DecimalDigits)
949
950 $(I Suffix):
951     $(I FloatSuffix)
952     $(I RealSuffix)
953     $(I ImaginarySuffix)
954     $(I FloatSuffix) $(I ImaginarySuffix)
955     $(I RealSuffix) $(I ImaginarySuffix)
956
957 $(I FloatSuffix):
958     $(B f)
959     $(B F)
960
961 $(I RealSuffix):
962     $(B L)
963
964 $(I ImaginarySuffix):
965     $(B i)
966
967 $(GNAME LeadingDecimal):
968     $(GLINK Decimal)
969     $(B 0) $(GLINK DecimalDigits)
970 )
971
972     Floats can be in decimal or hexadecimal format,
973     as in standard C.
974     <p>
975
976     Hexadecimal floats are preceded with a $(B 0x) and the
977     exponent is a $(B p)
978     or $(B P) followed by a decimal number serving as the exponent
979     of 2.
980     <p>
981
982     Floating literals can have embedded $(SINGLEQUOTE _) characters, which are ignored.
983     The embedded $(SINGLEQUOTE _) are useful for formatting long literals to
984     make them more readable, such
985     as using them as a thousands separator:
986
987 ---------
988 123_456.567_8       // 123456.5678
989 1_2_3_4_5_6_._5_6_7_8   // 123456.5678
990 1_2_3_4_5_6_._5e-6_ // 123456.5e-6
991 ---------
992
993     Floating literals with no suffix are of type double.
994     Floats can be followed by one $(B f), $(B F),
995     or $(B L) suffix.
996     The $(B f) or $(B F) suffix means it is a
997     float, and $(B L) means it is a real.
998     <p>
999
1000     If a floating literal is followed by $(B i), then it is an
1001     $(I ireal) (imaginary) type.
1002     <p>
1003
1004     Examples:
1005
1006 ---------
1007 0x1.FFFFFFFFFFFFFp1023      // double.max
1008 0x1p-52             // double.epsilon
1009 1.175494351e-38F        // float.min
1010 6.3i                // idouble 6.3
1011 6.3fi               // ifloat 6.3
1012 6.3Li               // ireal 6.3
1013 ---------
1014
1015     It is an error if the literal exceeds the range of the type.
1016     It is not an error if the literal is rounded to fit into
1017     the significant digits of the type.
1018     <p>
1019
1020     Complex literals are not tokens, but are assembled from
1021     real and imaginary expressions in the semantic analysis:
1022
1023 ---------
1024 4.5 + 6.2i      // complex number
1025 ---------
1026
1027 <h3>$(LNAME2 keyword, Keywords)</h3>
1028
1029     Keywords are reserved identifiers.
1030
1031 $(GRAMMAR
1032 $(I Keyword):
1033     $(B abstract)
1034     $(B alias)
1035     $(B align)
1036     $(B asm)
1037     $(B assert)
1038     $(B auto)
1039
1040     $(B body)
1041     $(B bool)
1042     $(B break)
1043     $(B byte)
1044
1045     $(B case)
1046     $(B cast)
1047     $(B catch)
1048     $(B cdouble)
1049     $(B cent)
1050     $(B cfloat)
1051     $(B char)
1052     $(B class)
1053     $(B const)
1054     $(B continue)
1055     $(B creal)
1056
1057     $(B dchar)
1058     $(B debug)
1059     $(B default)
1060     $(B delegate)
1061     $(B delete)
1062     $(B deprecated)
1063     $(B do)
1064     $(B double)
1065
1066     $(B else)
1067     $(B enum)
1068     $(B export)
1069     $(B extern)
1070
1071     $(B false)
1072     $(B final)
1073     $(B finally)
1074     $(B float)
1075     $(B for)
1076     $(B foreach)
1077     $(B foreach_reverse)
1078     $(B function)
1079
1080     $(B goto)
1081
1082     $(B idouble)
1083     $(B if)
1084     $(B ifloat)
1085 $(V2
1086     $(B immutable)
1087 )   $(B import)
1088     $(B in)
1089     $(B inout)
1090     $(B int)
1091     $(B interface)
1092     $(B invariant)
1093     $(B ireal)
1094     $(B is)
1095
1096     $(B lazy)
1097     $(B long)
1098
1099     $(B macro)
1100     $(B mixin)
1101     $(B module)
1102
1103     $(B new)
1104 $(V2
1105     $(B nothrow)
1106 )   $(B null)
1107
1108     $(B out)
1109     $(B override)
1110
1111     $(B package)
1112     $(B pragma)
1113     $(B private)
1114     $(B protected)
1115     $(B public)
1116 $(V2
1117     $(B pure)
1118 )
1119     $(B real)
1120     $(B ref)
1121     $(B return)
1122
1123     $(B scope)
1124 $(V2
1125     $(B shared)
1126 )   $(B short)
1127     $(B static)
1128     $(B struct)
1129     $(B super)
1130     $(B switch)
1131     $(B synchronized)
1132
1133     $(B template)
1134     $(B this)
1135     $(B throw)
1136     $(B true)
1137     $(B try)
1138     $(B typedef)
1139     $(B typeid)
1140     $(B typeof)
1141
1142     $(B ubyte)
1143     $(B ucent)
1144     $(B uint)
1145     $(B ulong)
1146     $(B union)
1147     $(B unittest)
1148     $(B ushort)
1149
1150     $(B version)
1151     $(B void)
1152     $(B volatile)
1153
1154     $(B wchar)
1155     $(B while)
1156     $(B with)
1157 $(V2
1158     $(B __FILE__)
1159     $(B __LINE__)
1160     $(B __gshared)
1161     $(B __thread)
1162     $(B __traits))
1163 )
1164
1165 <h3>$(LNAME2 specialtokens, Special Tokens)</h3>
1166
1167     $(P
1168     These tokens are replaced with other tokens according to the following
1169     table:
1170     )
1171
1172     $(TABLE2 Special Tokens,
1173     $(TR
1174     $(TH Special Token)
1175     $(TH Replaced with...)
1176     )
1177 $(V1
1178     $(TR
1179     $(TD $(B __FILE__))
1180     $(TD string literal containing source file name)
1181     )
1182     $(TR
1183     $(TD $(B __LINE__))
1184     $(TD integer literal of the current source line number)
1185     )
1186 )
1187     $(TR
1188     $(TD $(B __DATE__))
1189     $(TD string literal of the date of compilation "$(I mmm dd yyyy)")
1190     )
1191 $(V2
1192     $(TR
1193     $(TD $(B __EOF__))
1194     $(TD sets the scanner to the end of the file)
1195     )
1196 )
1197     $(TR
1198     $(TD $(B __TIME__))
1199     $(TD string literal of the time of compilation "$(I hh:mm:ss)")
1200     )
1201     $(TR
1202     $(TD $(B __TIMESTAMP__))
1203     $(TD string literal of the date and time of compilation "$(I www mmm dd hh:mm:ss yyyy)")
1204     )
1205     $(TR
1206     $(TD $(B __VENDOR__))
1207     $(TD Compiler vendor string, such as "Digital Mars D")
1208     )
1209     $(TR
1210     $(TD $(B __VERSION__))
1211     $(TD Compiler version as an integer, such as 2001)
1212     )
1213     )
1214
1215 <h3>$(LNAME2 specialtokenseq, Special Token Sequences)</h3>
1216
1217     Special token sequences are processed by the lexical analyzer, may
1218     appear between any other tokens, and do not affect the syntax
1219     parsing.
1220     <p>
1221
1222     There is currently only one special token sequence, $(TT #line).
1223
1224 $(GRAMMAR
1225 $(I SpecialTokenSequence):
1226     $(B # line) $(I Integer) $(I EndOfLine)
1227     $(B # line) $(I Integer) $(I Filespec) $(I EndOfLine)
1228
1229 $(I Filespec):
1230     $(B ") $(I Characters) $(B ")
1231 )
1232
1233     This sets the source line number to $(I Integer),
1234     and optionally the source file  name to $(I Filespec),
1235     beginning with the next line of source text.
1236     The source file and line number is used for printing error messages
1237     and for mapping generated code back to the source for the symbolic
1238     debugging output.
1239     <p>
1240
1241     For example:
1242
1243 -----------------
1244 int #line 6 "foo\bar"
1245 x;          // this is now line 6 of file foo\bar
1246 -----------------
1247
1248     Note that the backslash character is not treated specially inside
1249     $(I Filespec) strings.
1250
1251 )
1252
1253 Macros:
1254     TITLE=Lexical
1255     WIKI=Lex
Note: See TracBrowser for help on using the browser.