|
|
Label: ♦english
fu
created at Saturday, 2009-03-28, 21:10:18
fu
modified at Monday, 2009-06-08, 08:24:12
0 Replies, 736 Hits
Dao Language Specifications 1.1 (Draft: Alpha 1) - Lexical Structures
Dao programs are written in text encoded in UTF-8, an ASCII (American Standard Code for Information Interchange) compatible multi-byte Unicode encoding. Some character conversions are performed on the program source codes during the lexical translation.
Double Byte Characters (DBC) in the Unicode range 0xff00-0xff5f (exclusive) are converted to Single Byte Charactors (SBC). Such conversion occurs only outside of comments and string literals. This is to allow proper interpretation of operators which are not encoded in ASCII.
Dao mainly uses the sharp mark (#) to mark comments. C++ style of comment with double slash will also be supported:
The basic quotation marks (q.m.) to enclose string literals are the single quotation mark (s.q.m.) 0x27 and the double quotation mark (d.q.m.) 0x22. Some other marks are also interpreted as quotations, they are listed in the following table:
When interpreted for string literals, the marks in the Left Mark column must match properly marks in the Right Mark column. (TODO: proper handling quotations inside comments and quotations prefixed with backslashs, and disable grave accent 0x60 as quotation marks)
The following keywords are reserved for the language:
Keyword ::= Kwd1 | Kwd2 | Kwd3 | Kwd4 | Kwd5
| Kwd6 | Kwd7 | Kwd8 | Kwd9
Basic character classes:
DecDigit ::= '0' ... '9'
Where iswalpha() and iswalnum() are the C99 functions
that test if a wide character is belonging to certain class.
Here WideChar can be more than one byte,
in such case, these UTF-8 bytes are converted into Unicode before passing
to the C99 test functions.HexDigit ::= DecDigit | 'a' ... 'f' | 'A' ... 'F' AsciiLetter ::= 'a' ... 'z' | 'A' ... 'Z' WideChar ::= "UTF-8 encoded unit of one or more bytes" WideAlpha ::= WideChar & iswalpha( WideChar ) != 0 WideAlnum ::= WideChar & iswalnum( WideChar ) != 0
AsciiIdentifier ::= ( AsciiLetter | '_' ) ( AsciiLetter | DecDigit | '_' )*
WideIdentifier ::= ( WideAlpha | '_' ) ( WideAlnum | '_' )* Identifier ::= AsciiIdentifier | WideIdentifier
DecInteger ::= DecDigit+
HexInteger 'L' not implemented.HexInteger ::= ( '0x' | '0X' ) DecDigit+ Integer ::= DecInteger | HexInteger LongInteger ::= Integer 'L' Floating pointer number literals:
DotDec ::= DecDigit* '.' DecDigit+
DecDot ::= DecDigit+ '.' DecDigit* DecSinglePrecision ::= DotDec | DecDot DecNumber ::= DecInteger | DecSinglePrecision SciSinglePrecision ::= DecNumber 'e' [ '+' | '-' ] DecInteger SciDoublePrecision ::= DecNumber 'E' [ '+' | '-' ] DecInteger Float ::= DecSinglePrecision | SciSinglePrecision Double ::= DecNumber 'D' | SciDoublePrecision Complex number, imaginary part literal:
ComplexImaginary ::= '$'
Basic string literal:
SingleQuoteString ::= ' ' ' ValidCharSequence ' ' '
DoubleQuoteString ::= ' " ' ValidCharSequence ' " ' String literal with DBC quotation marks:
DBCSingleQuoteString ::= ' ' ' ValidCharSequence ' ' '
DBCDoubleQuoteString ::= ' " ' ValidCharSequence ' " ' String literal with Unicode single and double quotation marks:
USingleQuoteString ::= ' ‘ ' ValidCharSequence ' ’ '
UDoubleQuoteString ::= ' “ ' ValidCharSequence ' ” ' Here a ValidCharSequence is a sequence of characters where the enclosing quotation marks may only appear inside the sequence as escaped characters. So the followings are valid string literals:
' " '
' “ ' # the enclosing mark is ', so " “ can appear without problem " ' " " ” " “ ' ' ” # the same for other quotations ' \' ' " \" " String literal:
MultiByteString
Here the repeating marks mean two or more MultiByteString
or WideCharString can be placed one after another,
and they will will be jointed into a single string literal during lexical translation.::= SingleQuoteString | DBCSingleQuoteString | USingleQuoteString WideCharString ::= DoubleQuoteString | DBCDoubleQuoteString | UDoubleQuoteString String ::= MultiByteString+ | WideCharString+
Escape characters:
UnaryOperator ::= LeftUnaryOperater | RightUnaryOperator
Operator ::= UnaryOperator | BinaryOperator | AssignmentOperator | OtherOperator
Like in some other languages, semicolon can be used to mark the end of a statement. However the use of semicolon is optional, the compiler is able to determine the end of a statement based on some semantic rules.
Identifier:
MacroIdentifier ::= '$' ( 'VAR' | 'EXP' | 'ID' | 'OP' | 'BL' ) Identifier
Seperators:
MacroSeperator ::= '\(' | '\)' | '\{' | '\}' | '\[' | '\]'
| '\|' | '\!' | '\*' | '\+'
Source URL:
http://www.daovm.net/space/dao/thread/96
Comments
|
fu: Many thanks (Jul.04,04:29) klabim: fixed Hi, great, now my test works now :- ). (Jun.30,17:51) Nightwalker: Few suggestions (Jul.03,14:37) |