Label: ♦english

[96] Dao Language Specifications 1.1 (Draft: Alpha 1) - Lexical Structures

Comment

Dao Language Specifications 1.1 (Draft: Alpha 1) - Lexical Structures

  1. Introduction
  2. Lexical Structures
  3. Variables, Values and Types
  4. Expressions
  5. Statements
  6. Functions
  7. Classes
  8. Modules and Name Spaces

1 Introduction

 Top

2 Lexical Structures

 Top
  1. Character Conversion from DBC to SBC
  2. Comments
  3. Quotation Marks
  4. Keywords
  5. Basic Character Class Definitions
  6. Identifiers
  7. Literals
  8. Operators
  9. Miscellaneous

Dao programs are written in text encoded in UTF-8, an ASCII (American Standard Code for Information Interchange) compatible multi-byte Unicode encoding. Some character conversions are performed on the program source codes during the lexical translation.

2.1 Character Conversion from DBC to SBC

 Top

Double Byte Characters (DBC) in the Unicode range 0xff00-0xff5f (exclusive) are converted to Single Byte Charactors (SBC). Such conversion occurs only outside of comments and string literals. This is to allow proper interpretation of operators which are not encoded in ASCII.

2.2 Comments

 Top

Dao mainly uses the sharp mark (#) to mark comments. C++ style of comment with double slash will also be supported:
  • Single line comment: from # to the end of the line;
  • Single line comment: from // to the end of the line (not implemented);
  • Multiple line comment: paired with #{ and #};
Here #(0x23) and {(0x7b) could be their DBC version, namely, 0x23+0xfee0, and 0x7b+0xfee0. Multiple line comments may contain other #{ and #}, if they are properly paired, namely, they are allowed to be nested.

2.3 Quotation Marks

 Top

The basic quotation marks (q.m.) to enclose string literals are the single quotation mark (s.q.m.) 0x27 and the double quotation mark (d.q.m.) 0x22. Some other marks are also interpreted as quotations, they are listed in the following table:
Left Mark Right Mark Conversion
single q.m. 0x27 s.q.m. none
double q.m. 0x22 d.q.m. none
DBC single q.m. 0x27+0xfee0 DBC s.q.m. s.q.m. 0x27
DBC double q.m. 0x22+0xfee0 DBC d.q.m. d.q.m. 0x22
left single q.m. 0x2018 right s.q.m. 0x2019 s.q.m. 0x27
left double q.m. 0x201c right d.q.m. 0x201d d.q.m. 0x22

When interpreted for string literals, the marks in the Left Mark column must match properly marks in the Right Mark column.

(TODO: proper handling quotations inside comments and quotations prefixed with backslashs, and disable grave accent 0x60 as quotation marks)

2.4 Keywords

 Top

The following keywords are reserved for the language:
  • for types or structures:
    Kwd1  ::=  any  |  enum  |  int  |  float  |  double  |  long  |  complex 
                  |  string  |  array  |  list  |  tuple  |  map  |  stream
                  |  class  |  routine  |  function
  • for variable scope:
    Kwd2  ::=  my  |  const  |  local  |  global
  • extra for class:
    Kwd3  ::=  final  |  private  |  protected  |  public  |  virtual  |  self
  • for branching and looping statements:
    Kwd4  ::=  if  |  else  |  elif  |  elseif  |  for  |  in  |  while  |  do  |  until
                  |  switch  |  case  |  default  |  break  |  skip
  • for exception handling statements:
    Kwd5  ::=  try  |  retry  |  rescue  |  raise
  • for other statements:
    Kwd6  ::=  use  |  load  |  import  |  require  |  by  |  return  |  yield
  • for standard or builtin library objects:
    Kwd7  ::=  stdio  |  stdlib  |  math  |  coroutine  |  reflect 
                  |  mpi  |  network  |  mtlib
  • for standard or builtin library types:
    Kwd8  ::=  pair  |  curry  |  buffer  |  thread  |  mutex 
                  |  condition  |  semaphore
  • others:
    Kwd9  ::=  typedef  |  syntax  |  as  |  and  |  or  |  not
                  |  async  |  hurry  |  join  |  null
    (TODO: switch the current use of nil to null )
Keyword  ::=  Kwd1  |  Kwd2  |  Kwd3  |  Kwd4  |  Kwd5
                    |  Kwd6  |  Kwd7  |  Kwd8  |  Kwd9

2.5 Basic Character Class Definitions

 Top

Basic character classes:
DecDigit  ::='0'  ...  '9'
HexDigit  ::=  DecDigit  |'a'  ...  'f'|'A'  ...  'F'
AsciiLetter  ::='a'  ...  'z'|'A'  ...  'Z'

WideChar  ::="UTF-8 encoded unit of one or more bytes"
WideAlpha  ::=  WideChar  &  iswalpha(  WideChar  )  !=  0
WideAlnum  ::=  WideChar  &  iswalnum(  WideChar  )  !=  0
Where iswalpha() and iswalnum() are the C99 functions that test if a wide character is belonging to certain class. Here WideChar can be more than one byte, in such case, these UTF-8 bytes are converted into Unicode before passing to the C99 test functions.

2.6 Identifiers

 Top

AsciiIdentifier  ::=(  AsciiLetter  |'_')(  AsciiLetter  |  DecDigit  |'_')*
WideIdentifier  ::=(  WideAlpha  |'_')(  WideAlnum  |'_')*

Identifier  ::=  AsciiIdentifier  |  WideIdentifier

2.7 Literals

 Top

2.7.1 Number Literals
 Top
Integer literals:
DecInteger  ::=  DecDigit+
HexInteger  ::=('0x'|'0X')  DecDigit+

Integer  ::=  DecInteger  |  HexInteger
LongInteger  ::=  Integer  'L'
HexInteger 'L' not implemented.

Floating pointer number literals:
DotDec  ::=  DecDigit*'.'  DecDigit+
DecDot  ::=  DecDigit+'.'  DecDigit*
DecSinglePrecision  ::=  DotDec  |  DecDot
DecNumber  ::=  DecInteger  |  DecSinglePrecision
SciSinglePrecision  ::=  DecNumber  'e'['+'|'-']  DecInteger
SciDoublePrecision  ::=  DecNumber  'E'['+'|'-']  DecInteger

Float  ::=  DecSinglePrecision  |  SciSinglePrecision
Double  ::=  DecNumber  'D'|  SciDoublePrecision

Complex number, imaginary part literal:
ComplexImaginary  ::='$'

2.7.2 String Literal
 Top

Basic string literal:
SingleQuoteString  ::=' '' ValidCharSequence '' '
DoubleQuoteString  ::=' "' ValidCharSequence '" '

String literal with DBC quotation marks:
DBCSingleQuoteString  ::=' ' '  ValidCharSequence  ' ' '
DBCDoubleQuoteString  ::=' " '  ValidCharSequence  ' " '

String literal with Unicode single and double quotation marks:
USingleQuoteString  ::=' ‘ '  ValidCharSequence  ' ’ '
UDoubleQuoteString  ::=' “ '  ValidCharSequence  ' ” '

Here a ValidCharSequence is a sequence of characters where the enclosing quotation marks may only appear inside the sequence as escaped characters. So the followings are valid string literals:
' " '
' “ '    # the enclosing mark is ', so " “ can appear without problem
" ' "
" ” "
“  ' '  ”    # the same for other quotations
' \' '
" \" "

String literal:
MultiByteString
    ::=  SingleQuoteString  |  DBCSingleQuoteString  |  USingleQuoteString 

WideCharString
    ::=  DoubleQuoteString  |  DBCDoubleQuoteString  |  UDoubleQuoteString 

String  ::=  MultiByteString+|  WideCharString+
Here the repeating marks mean two or more MultiByteString or WideCharString can be placed one after another, and they will will be jointed into a single string literal during lexical translation.
2.7.3 Escape Sequences in String Literal
 Top

Escape characters:
  • \\ : backslash;
  • \t : horizontal tab;
  • \f : form feed; (not implemented)
  • \n : line feed;
  • \r : carriage return;
  • \' : single quotation mark;
  • \" : double quotation mark;
Escape digits (not implemented):
  • \ooo : character with octal value ooo ;
  • \xhh : character with hex value hh ;
  • \uxxxx : Unicode character with hex value xxxx ;
  • \uxxxxxxxx : Unicode character with hex value xxxxxxxx ;

2.8 Operators

 Top

  • Left unary operators:
    LeftUnaryOperater  ::='++'|'--'|'!'|'~'|'$'|'not'
  • Right unary operators:
    RightUnaryOperator  ::='$'
  • Binary operators:
    BinaryArith  ::='+'|'-'|'*'|'/'|'%'|'**'
    BinaryComp  ::='=='|'!='|'<'|'>'|'<='|'>='
    BinaryBool  ::='&&'|'||'|'and'|'or'
    BinaryBit  ::='&'|'|'|'^'|'<<'|'>>'
    CompAssign  ::='+='|'-='|'*='|'/='|'&='|'|='

    BinaryOperator  ::=  BinaryArith  |  BinaryComp  |  BinaryBool  |  BinaryBit
  • Assignment:
    AssignmentOperator  ::='='|':='|'+='|'-='|'*='|'/='|'&='|'|='
  • Other operators:
    OtherOperator  ::=    '=>'|':'|'.'|'...'
UnaryOperator  ::=  LeftUnaryOperater  |  RightUnaryOperator

Operator  ::=  UnaryOperator  |  BinaryOperator 
                      |  AssignmentOperator  |  OtherOperator

2.9 Miscellaneous

 Top

2.9.1 Semicolon
 Top

Like in some other languages, semicolon can be used to mark the end of a statement. However the use of semicolon is optional, the compiler is able to determine the end of a statement based on some semantic rules.
2.9.2 Macro Identifier and Seperators
 Top

Identifier:
MacroIdentifier  ::=    '$'('VAR'|'EXP'|'ID'|'OP'|'BL')  Identifier

Seperators:
MacroSeperator  ::='\('|'\)'|'\{'|'\}'|'\['|'\]'
                                  |'\|'|'\!'|'\*'|'\+'

3 Variables, Values and Types

 Top

4 Expressions

 Top

5 Statements

 Top

6 Functions

 Top

7 Classes

 Top

8 Modules and Name Spaces

 Top

Comments

Change picture:

Choose file:

12 3
456789 10
111213141516 17
181920212223 24
2526272829 30 31

fu: Many thanks (Jul.04,04:29)

klabim: fixed Hi, great, now my test works now :- ). (Jun.30,17:51)

Nightwalker: Few suggestions (Jul.03,14:37)

This site is powered by Dao
Copyright (C) 2009,2010, daovm.net.
Webmaster: admin@daovm.net