( @(user) ) Login/Signup Logout
The Dao Programming Language
for Scripting and Computing

Home Documentation Download Blog Forum Projects Demo
Latest News
Label: ♦english

[649] Functional revolution

Comment
Not sure now is the best time for proposing new language enhancements, but the idea I've come with seems interesting enough to bring it up while it's still hot. (Shortly, it's just another fixed idea of mine :) )
So, I was thinking about Dao functional methods: whether it's possible to make their support nicer and more flexible. Currently, functional methods are hard-coded; that is, they are like fixed, standardized operators which can be used only "as is". More precisely, it means that:
  1. Each functional method formally represents a separate feature, bloating the language syntax a little;
  2. If we need a new functional method, it also must be hard-coded (which leads to the clause 1 and restricts us from having many such methods);
  3. There is no way to create or adapt functional methods for direct use with a custom class/type (if you wonder why would someone need that, consider SQL, XPath and other data retrieval stuff).
Of course, this is not a terrible flaw, but let's think if we can make things better.
What if it's accomplishable to provide single feature which can be utilized to implement any possible functional method as if it was an ordinary routine? Then the core types (list, array, etc.) could implement various functional methods just like any other routine. For instance, like this:
static DaoFuncItem listMeths[] =
{
...
{ DaoLIST_Select, "select( self: list<@T> )| x: @T |=>list<@T>" },
{ DaoLIST_Apply,    "apply( self: list<@T> )| x: @T, i: int |" }
...
};
And the usage would simply be:
list1 = list2.select{x > 0};

list3.each{|elem| file.write(elem)};

list4.apply{i + 1};

list5 = (list6, list7).map{(x, y)};

str = string.unfold(5){ (string)i + " " };

sum = list8.fold{|x1, x2| x1 + x2};

#and so on
I suppose you've already guessed what that routine signature means and what the above Dao code is supposed to do, so I won't dwell on that. Now, that is how it might be provided for custom Dao-level classes:
class myList<@T> 
{
    ...
    routine select()|x: @T| 
    {
        res: myList<@T>;

        for (elem in self.internal_data)
            if (closure(elem))     #"closure()" inits and executes code passed to the routine

                res.append(elem);

        return res;
    }
}
As you can see, I tend to view this hypothetical feature as closures support. Essentially, closures are already supported in Dao: the body of any current functional method is a closure of some sort. The only question is whether it can be made available for free use as language entity.
As a conclusion, with this feature available, we could:
  1. Get rid of all hard-coded functional methods in exchange for lots of new ones on higher level, covering wider functionality without bloating the core syntax;
  2. Provide flexible functional method support for user-defined classes, which could be used to organize domain-specific data retrieval facilities;
  3. Obtain more compact and nice functional method usage syntax;
  4. Have even more fun with Dao! :)

Comments
It does looks nicer and more extensible, I have been giving some thinking on the possible way of implementation.

Clearly, as you have mentioned, closure is a reasonable option. However, in line with the current implementation, I am wondering if it is possible implement it as code section . Code section , which I actually don't know the proper way to call it, is basically a uniquely identifiable block of virtual machine instructions that are skipped in normal execution, but upon calling functional methods, they can be executed to produce results. Their scoping are essentially the same as the normal codes. Code sections are more efficient (and actually more convenient) than closures.

Another more tricky issue is static typing, there may be some pitfalls need to be carefully avoided. Supporting this kind of methods in C types is relative easy, but supporting them in Dao class as code sections seem to be a bit more challenging.

BTW, did you get inspiration from Ruby:) ?
Code section you described is actually exactly how I imagine such thing: a block of code being plainly executed with certain parameters. Closure was just the first name I came with recalling more or less similar features in other programming languages. As for the static typing, I also wasn't sure if it's feasible in case of Dao; I just thought that it probably would be better with such thing than without it.
I suspected you would be interested of where I pulled this idea from :) The thought about turning the functional methods into actual methods and the syntax obj.method{ code } appeared accidentally, without any evident subject for inspiration. That time I didn't think it over enough, and eventually pushed it away. Than I encountered a new to me, yet unexplored programming language, Fantom , and dug deep into it in search for something valuable (sounds like raiding:) ). And there I encountered closure support in pretty much the same fashion as the idea I thought about earlier (only the use of parameters was less flexible). I considered it anew and soon became confident that it would be a real fun to have such feature in Dao. Perhaps it was predetermined... :)
So it was a convergence of ideas, cool. When people see features superficially resembling features in other languages, they tend to think that's a result of those languages' influence. But it is not always the case.

Just a bit clarification, code section is not closure. Closure is just a function with an environment bound to it. When you create a function in Dao using "f=routine(...){...}", you will either get a closure, if it accessed local variables in its caller, or an anonymous function (without environment besides a namespace), otherwise. A code section is embedded in the function where it is written, it is more like a block of C codes marked by a label, and normally skipped by a goto, and can be executed by go to that label repeatedly.
Last night when I was thinking about supporting this kind methods in Dao classes, I was actually considering to reuse the yield keyword for executing the code section. And now I just find out Ruby actually uses yield for methods with code block (that's how they call it, and it is implemented as a closure), after I googled how Ruby allows users to define such methods!
Now no one will believe we haven't pulled this whole thing from Ruby :)
Utilizing yield seems good indeed. BTW, another idea: we could use typing to specify the expected returning value type for the code section. Perhaps like this:
apply( self: list<@T> )|x: @T, i: int => @T|
May seem a little too much though. However, I suppose it can be easily (i.e., dynamically) supported even without static typing available for the parameters.
[655654] True
Click to expand
(fu, 4, 2011-09-14, 02:46:52) Comment
It would be very hard to convince people this feature is not pulled from Ruby. But with static typing, we may have some chance:)

Now I had some rough idea of how to implement this feature. Still, there is a minor problem with parsing, that is, how to distinguish code block associated with such calls from curry. Maybe we will have to require "|" to be always presented after the "{", or we use "@{}" to enclose such code blocks. "@{}" have been used in earlier versions for code sections. Even now "@(...){...}" is still supported to created anonymous function or function closure. So maybe "@{}" is also a reasonable choice.
Actually, I'd rather think up a new notation for curries than bloat the syntax of the code section. In the worst case, we will have string.unfold(9)@{|i| (string)i} which looks monstrous. And I don't like extensive use of magic characters, specifically where they could be omitted.
For instance, we could use func[arg1, arg2, ...] for curries.
[657656] ...
Click to expand
(fu, 5, 2011-09-15, 05:26:08) Comment
Then it will become impossible to distinguish curries from sub-indexing.

Now I am considering to use "[]", which will be better than "||" in type names for such functional methods. If we use "||", it will be confusing with the variant type. Using "[]" may slightly simplify parsing. As an example,
select( self: list<@T> )[ x: @T => int ]=>list<@T>
will has type name,
routine<self:list<@T>=>@T>[x:@T=>int]


When using, one will have to explicitly specify the block parameters, otherwise the parser will not know which are the block parameters.
ls2 = ls.select{[x] x > 0};
One cannot use,
o.meth{ codes }
unless select() does not require block parameter. But it is clear that functional methods that do not require block parameters are rare, so for such rare cases, it should be acceptable to use,
o.meth{[] codes }


Edit:
Maybe at use, it is still better to use "||", because there is still some ambiguity in "x.y{[...] ...}", e.g., "x.y{ [a,b,c].sum() }", which will require look forward to determine if "[]" is part of an expression or is the block parameter specifier. For "||", there is no such ambiguity in such context.
Using "[]" for the parameters seems acceptible. Also, there is no need to specify parameter names all the time -- empty brackets should be enough for the parser:
list1.select{[] x > 0}
And it's better to use the same brackets for both the declaration and use, as otherwise it will be confusing for the users.
However, I'm still thinking about how we could alter the curry syntax so that it woudn't look almost the same as the code section (and we wouldn't need to use even empty "[]" for the latter). That's because the code sections benefit from compactness more than curries (particularly when the functional method also requires ordinary arguments) and are supposed to be used more often. So perhaps we might utilize one of these notations for currying:
func:(x, y)

func:{x, y}

func:[x, y]
I suppose it could be easily distinguished from hash key:value with the help of additional brackets.
[659658] ...
Click to expand
(fu, 6, 2011-09-16, 04:14:55) Comment
The current curry syntax is not just for curry, it is also used for uniform initialization of tuples and class instances. It is such a basic and essential feature, and has been there for quite some time, it doesn't sound wise to change it. Now I am considering to use "::{}" for code sections (a single colon will introduce ambiguity). It is less compact than simple "{}", but it is more readable.
ls.apply::{it**2+3}.fold(10)::{[res,it] res*it}

OK, let's leave curries and other stuff alone. It seems you adapted my idea for the code section -- looks great! Now, I see you used it as default parameter name. Does it mean you've already decided to forbid implicit naming? You see, lately I've also started to think that it may be reasonable :)
But only for a very special case: a variable that is not defined outside (before) the code section and is used as the first token in the code section, will be considered as a parameter (and the only parameter) to the code section. So the following are equivalent:
sublist = lst.select::{x > 2}.map::{x**2}
sublist = lst.select::{y > 2}.map::{x**2}
sublist = lst.select::{item > 2}.map::{x**2}
This may make the simple uses simpler (and more compact) :)
Why do we need arbitrarily named single parameter when it alone is sufficient? And if we use strict name-keyword, there will be much less problems in use. For instance:
var value = 0;
lst.apply::{value}; #not allowed?

lst.map::{func(it)}; #insufficient?

[663662] ...
Click to expand
(fu, 6, 2011-09-16, 08:53:52) Comment
I used it just as an example, with no intention as an implicit parameter name. The support of arbitrarily named single parameter was just an idea, we can well go with the option of having implicit parameter name(s). If so, I think we should limit the number of implicit parameter names to 2, which is the number of parameters needed for several commonly used methods such as sort() , fold() , and possibly others, if we pass both the item value and item index of list/array etc. As for the names, "x,y" seems to be more generic.
Well, I haven't come up with anything better, so using "x,y" as universal parameter names seems acceptable to me.
BTW, how the code below should work: generate list of -1s or list of arrays? :)
lst.map::{[x] - 1}

[665664] ...
Click to expand
(fu, 7, 2011-09-17, 00:10:30) Comment
This will generate a list of -1s, as "[x]" is a valid parameter specifier, it will preferably have higher priority in parsing. If "[x]" is intended as an array, one may simply do,
lst.map::{([x] - 1)}

You already made it! I've played with the preliminary implementation -- looks awesome! And you've managed to fully support typing, so the most challenging part seems over.
I shall create issue on Googlecode to further discuss what functional methods we may now need!
P.S. It's amazing how a purely accidental thought may cause a substantial impact on something :)
P.P.S. Looking now at the number of this post and its title, strange thoughts are beginning to emerge... }:>
What is left is the methods to replace the old string(), list(), array() and repeat() . Not sure where to put them yet, before figuring out this, I will add support for user defined functional method first:)

I start to like this new feature of functional methods much more than the previous one, glad that you brought up this accidental thought:)
The generating methods can be implemented as static ones, no? I already gave few examples like:
string.unfold(5)::{[x] (string)x}
As for repeat , it doesn't seem as much useful. However, we could place it into std , dao or so.
Also, we don't yet have delete from issue 114 (I'd call it erase though). It could additionally return the number of elements removed.
Oh, and also there should be map for two lists ( tuple<list, list> ), as I already pointed out:
lst3 = (lst1, lst2).map::{[x, y] func(x, y)}
(Don't want to bring it to Googlecode now) BTW, I would change the first parameter name of array methods to "item", as "E, I, J, K, L, M" looks a bit confusing.
And what about the implicit parameter naming? Do you still intend to support it?
Another question is what code (i.e what syntax constructions) is generally allowed in the code section? That should be made clear to avoid misuse.
Finally, I think we may support new method has (or contains ):
if (lst.has::{[x] x in lst2}) ...
P.S. I'm also glad how it turned out. It's twice as good when you can both simplify and enhance something :)
[669668] ...
Click to expand
(fu, 3, 2011-09-20, 04:18:08) Comment
The generating methods can be implemented as static ones, no? I already gave few examples like:
string.unfold(5)::{[x] (string)x}
I have considered this, but this will require adding methods to type type, and directly use type object as a value. I prefer a better alternative, but if none can be found, then I will support it this way (probably with a different method name).

Also, we don't yet have delete from issue 114 (I'd call it erase though). It could additionally return the number of elements removed.
We can support erase() as a functional method, this may be very handy to use.

Oh, and also there should be map for two lists ( tuple<list, list> ), as I already pointed out:
lst3 = (lst1, lst2).map::{[x, y] func(x, y)}
I was considering the following (for map and array as well),
lst1.map( lst2 )::{ func(X, Y) }

(Don't want to bring it to Googlecode now) BTW, I would change the first parameter name of array methods to "item", as "E, I, J, K, L, M" looks a bit confusing.
Here "E" stands for element ("elem" could also be considered), and the rest stands for indices. But maybe it is doesn't matter, it is pretty easy to understand once explained.

And what about the implicit parameter naming? Do you still intend to support it?
Already supported:) I forgot to mention it, in the end I chose to use X,Y instead of x,y . I feel X,Y stand out better as implicit parameters.

Another question is what code (i.e what syntax constructions) is generally allowed in the code section? That should be made clear to avoid misuse.
Almost any codes that can be used outside a code section, can be used inside it. If there is anything not allowed, it should be possible to detect it a compiling time.

Finally, I think we may support new method has (or contains ):
if (lst.has::{[x] x in lst2}) ...
What is the semantics for this?
What is the semantics for this?
(Nesting citing seems not supported) I thought has could be used to determine if list/array/map has at least one element complying with the condition.
Almost any codes that can be used outside a code section, can be used inside it.
Well, I found that if () cannot.
Here "E" stands for element ("elem" could also be considered), and the rest stands for indices. But maybe it is doesn't matter, it is pretty easy to understand once explained.
I beleive it's better when you can understand that just looking at the method signature, without the need of explanation :)
[671670] ...
Click to expand
(fu, 4, 2011-09-21, 18:59:01) Comment
Then it may be better to have a method named "find()" which will return 1 plus the index of the first found item, and return 0 if not found.

Are you sure if() cannot be used inside code section? I tried, but see no problem.
Are you sure if() cannot be used inside code section? I tried, but see no problem.
Now it indeed started to work :)
Then it may be better to have a method named "find()" which will return 1 plus the index of the first found item, and return 0 if not found.
Makes sense. It may also take optional arguments for initial index and search direction. Though returning 0 for no result is inconsistent with certain other functions, particularly string methods.
[673672] ...
Click to expand
(fu, 5, 2011-09-22, 00:41:21) Comment
Makes sense. It may also take optional arguments for initial index and search direction. Though returning 0 for no result is inconsistent with certain other functions, particularly string methods.
Returning one-based index for found item may make the use of such function in testing slight simpler, like:
if( a.find( c ) ) ...
instead of
if( a.find( c ) >=0 ) ...
No sure if such small convenience could justify the adoption of this for both list and string.
Yeah, I thought about that. But that causes ambiguity. Let's suppose we've changed string.find() to return index + 1. Now, what should return string.match() as start and end ? And what about the functional method index() ? To avoid the muddle, it's better to use the same convention everywhere, and using exact index for this seems the most unambiguous to me.
I've run across an extravagant idea of how else we could benefit from the new functional methods.
The first thing I've come with is using the code sections for synchronization in multithreading (so that the code section would represent a critical section). A typical example:
mutex1.lock::{
    if (cache[key]??)
        return;

    cache[key] = value;
    process(value);
    ...
}
With the help of the code sections, we don't need to call .unlock() , which is especially handy when our critical section has several return points. And, no doubts, the code becomes nicer! The same concerns semaphores as well.
As you may have guessed, the approach shown above might be used for different cases in order to simplify specific code. It seems fit for critical sections, perhaps there are other interesting ways to employ this idea.
I must say your recent ideas/suggestions are very impressive:)

Not sure about the method name "lock", maybe it is better to call it "protect" (or "sync"), which seems to be suitable for both mutex and semaphore. Two (actually one, overloaded) interesting methods I can think of include "mtlib.parallel()::{}" (to run a code section in parallel with the caller) and "mtlib.parallel( count :int )::{}" (to run a parallel loop with the code section). I am starting to wonder if something similar to OpenMP can be implemented as such functional methods.
Can't say I didn't think of something like your "mtlib.parallel()", I just don't know how to make such things reliable: as the code section may capture local symbols, any thread created from it should be automatically joined upon reaching the scope's end.
But if that's not a problem, the idea of OpenMP-like functionality seems really astonishing :)
As for the mutex/semaphore method name, "guard" seems to be the most fitting one to me.
After some consideration about possible forms of "mtlib.parallel()", I eventually stopped on the following synchronous variant:
mtlib.parallel(3)::{ #3 threads

    [id]
    if (id == 0){
        #main thread

    }	
    else if (id == 1){
        #spawned thread 1

    }
    else{
        #spawned thread 2

    }
} #barrier synchronization
Just like in OpenMP and MPI, the section is executed by all threads including the main one, and each thread is distinguished by its rank/id. Upon reaching the section's end, all threads are joined. Perhaps it's more or less the same as you actually meant (I may have misunderstood your idea). One way or another, this seems to be the best design I've imagined so far.
As for the method for running a parallel loop, I think a different name should be used, e.g. "each" or "loop". As I interpreted it, this method should divide the given range equaly amongst the threads (like "omp parallel for"), with each thread passing it's own current counter value to the code section:
each( count: int, threads: int )[counter: int]
mtlib.each(lst.size(), 2)::{[i] proc(lst[i])};
Having each() alone may be not enough; there could also be parallel reduce() and find() :
reduce( count: int, init: @T, threads: int )[counter: int, value: @T => @T]=>list<@T>
find( count: int, threads: int )[counter: int => int]=>int
lstSum = mtlib.reduce(lst.size(), 0, 2)::{[i, acc] acc + lst[i]}.sum();
index = mtlib.find(lst.size(), 2)::{[i] lst[i] < 0};
Together, these three methods seem sufficient for any kind of parallel looping.
Well, if we want to support something really like OpenMP, we will have to separate it from the kernel and put it in a module. For the kernel I am considering to add something like the followings,
mtlib.repeat/do( times :int, threads :int )[index :int, threadid :int]

mtlib.map( alist :list<@T>, threads :int )[item :@T, index :int, threadid :int =>@T2] =>list<@T2>

mtlib.each( alist :list<@T>, threads :int )[item :@T, index :int, threadid :int]

mtlib.apply( alist :list<@T>, threads :int )[item :@T, index :int, threadid :int =>@T] =>list<@T>

mtlib.map( amap :map<@K,@V>, threads :int )[key :@K, value :@V, threadid :int =>tuple<@K2,@V2>] =>map<@K2,@V2>

mtlib.each( amap :map<@K,@V>, threads :int )[key :@K, value :@V, threadid :int]

mtlib.apply( amap :map<@K,@V>, threads :int )[key :@K, value :@V, threadid :int =>@V] =>map<@K,@V>

# following your post:
mtlib.find( alist :list<@T>, threads :int )[item :@T, index :int, threadid :int =>int] =>tuple<index:int,item:@T>|null

mtlib.find( amap :map<@K,@V>, threads :int )[key :@K, value :@V, threadid :int =>int] =>tuple<key:@K,value:@V>|null
Similar methods may also be supported for array. reduce() is sequential by definition, so it cannot be supported here.

A few words regarding the possible implementation. For simplicity, the calling thread will not be used to run the code section. A thread pool will be created to maintain a minimum number of threads to avoid frequent thread creation and destruction. About variable accessing, I am considering to support a very simple scheme with the following rules:
  • all simple data of primitive types (integer, float, double, long, complex, string, enum) are private to each thread (tuple is the perfect data type to share data across threads);
  • all data created inside the code section is private (as long as it stays in stack) to thread that created it;
  • variable and non-primitive data binding is private to each thread. The following is an example to demonstrate what I mean by this:
    a= { 1, 2, 3 };
    mtlib.repeat( 10, 2 )::{
        a = { 4, 5, 6 };
    }
    # here "a" will be still {1, 2, 3}
If the last rule is adopted, the implementation will be straight forward, otherwise, it will be a bit more tricky to implement. I not sure if users will take this rule well.
First, about OpenMPI functionality. I've tried to think up a possible design, but eventually have come to conclusion that such thing (as far as it can be simulated using functional methods) wouldn't offer something essentially different, unachievable by the proposed kernel methods. Instead, it would rather complicate and obfuscate the code. Few things which are a little harder to simulate with the ordinary means seem too specific to be really useful. Either way, I failed to devise something impressive enough.
Now, about reduce() . I see no obstacles for supporting it the way I proposed in the last post: each thread uses its own, local accumulator, and the method returns the list of all thread-accumulated values (which can be further reduced in the usual way). It's the same as in OpenMP where individual threads can accumulate private data which in the end is usually reduced to a single result.
As for the mass of parallel mtlib methods you proposed, I now think we indeed may need so many of them. For instance, matrix calculations require exclusive methods for array in order to be handled properly. And I suppose we still need simple non-cycling
execute/do/once(threads: int)[threadid: int]
Regarding the variable access rules: the last one doesn't conform with the usual rules for the code section. Shouldn't the parallel section access data the same way as "normal" one? If not, than it's something else and should be distinguished accordingly. However, I think we actually could adopt this rule for all functional methods, as modifying outer variables is not a good style for the code section (the wildest case is when the section directly changes the container it's assigned to). Then it would really be a worthy improvement.
Finally, there is another thing which could also be supported -- setting the default number of threads to be used with all parallel methods in case when the threads parameter is not specified. This may come in handy for writing scalable massively multithreaded code.
I agree with you that OpenMP will be unnecessary after we support the proposed methods, which should be sufficient for common situations.

About reduce() , it really depends on which kind of operations are applied to reduce, for associative and commutative operations such as add, multiply etc., it is possible to parallelize in the way you said, but for more complicated operations, it is not parallelizable in general.

For variable accessing in code sections, the outer variables should be accessible, this is main reason to support code section. Otherwise code sections can simply be implemented as anonymous function, but then we will lose a lot of flexibility and efficiency. So I prefer not to change the variable accessing method in code sections. But in code sections of parallelized methods, variable accessing has to be different from that in normal code sections. That's because, with the normal unrestricted variable accessing, when one thread modifies a local variable and another thread reads that variable, there is not only race condition issues, there will also be memory accessing error, because modification to a variable may cause an object to be released while another thread is accessing the object.

After careful thinking, I think my original proposal of rules are reasonable, though it violates the principle of "least surprise", that't the best we can do in terms of simplicity, efficiency and maintainability (no implementation caveats) etc. If we do not emphasize too much on efficiency, we may add a layer of automatic data synchronization at the boundary of starting and ending of code section executions. Namely, each working thread updates its own copies of outer variables of the section (no copying for non-primitive data, just update reference) from the calling thread before the code sections is executed, and update its outer variables using its local copies of the variables. With such data synchronization, the final effects of parallelized functional functions on its outer variables will be more close to that of unparallelized functional functions. But I am not sure it worths to support it, because it will bring significant overheads.
Then, without any kind of reduce() , outer variables simply must be writable, one way or another. How else can we support in-thread data accumulation? Thus we need either something like reduce() , as a "legal" way to return any data from the spawned threads, or the synchronization you described. I tend to the first option, as it is obviously much simpler.
However, data synchronization may also be helpful, but in explicit form. Implicit synchronization would rather be a Bear service (couldn't resist to use this phrase; to understand what it means, just imagine a bear helping you programming :) ), as it won't happen immediately when you try to change the outer variables. This aspect can lead to certain very subtle bugs which I would never want to encounter. So, perhaps some explicit, transparent way to "flush" the changes could be provided? In this form it would not bring so much overhead as well.
I also prefer the first option, actually I have realized that we can add a very simple restriction on outer variable accessing in code section to keep it consistent between normal functional methods and parallelized functional methods. That is, outer variables cannot be assigned with new value inside code sections, they can be modified in any other ways, but not by assigning with new values. This will effective avoid outer variables binding to new values inside code sections, and it will also avoid the "wildest case" you mentioned before. It should have minimum impact on the flexibility of functional functions.

With this restriction, I am also considering to support all outer variables as shared data between threads, and all local (declared inside the code section) variables as private data within each thread. This is simpler and more intuitive than the previous rules I mentioned before.

With these changes, data accumulation will be allowed, so the following will be valid:
sum = 0;
alist.each::{ sum += X };

Not sure I get you right: do you mean that only assignment operator is to be restricted, or perhaps assigning implies something more general?
Actually, I think the best strategy of using outer variables might be found looking at how synchronous classes work. Synchronous classes represent a very reliable way to access a resource: each query, either for reading or modifying, is queued keeping the exact order of calls. For instance, your example with "sum += X" doesn't seem reliable, as running this code by several threads simultaneously might cause sum to be overwritten during the addition. But if sum is an object of synchronous class, that cannot happen by any means.
I don't mean we should use synchronous classes instead of ordinary data types -- it would be too cumbersome. Rather, I think this very principle might be adapted for our case, perhaps in some different form.
Actually the example I gave didn't mean (as you can see I used list::each() , not mtlib.each() ) for parallelized functional method, I made a mistake (now fixed in the post) in referring to "in-thread data accumulation", actually I mean "data accumulation" only. For parallelized version, obviously one should do:
sums = { 0 : 0 : T }; # one accumulator per thread;
mtlib.each( abiglist, T )::{ sums[Y] += X }
sum = sums.sum();

Speaking of synchronous class, maybe after supporting these parallelized functional methods, we can retire the synchronous class feature, because I am not sure how much effort it will take to make it right and complete, and how much effort it will take to maintain it. Now I am kind of preferring to keep the kernel small and easily maintainable. I think the parallelized functional methods we are planning now is powerful enough:)

Edit: not really decided yet, if in the end, the synchronous class feature can turn out as simple as these parallelized functional methods, I will gladly support it:)
Personally, I am strongly against removing synchronous classes; I'd rather agree to remove the ordinary thread type :) That is because synclasses are a great advantage of Dao over all programming languages I've seen so far. Few of them have some agent frameworks, but they support only primitive "send/receive" interface. In Qt, it is possible to emulate something similar to synclasses using QThread with signals/slots, but it still requires much more efforts and is lower-level comparing to Dao.
Synchronous classes should not be something sophisticated; for instance, I believe we don't need any restrictions for them concerning data passing/usage etc. Actually, when they were usable, synclasses were already nearly complete; from my point of view, they may need only few minor future methods to become full-fledged.
The most important thing "pro" concerning synclasses is that parallelized functional methods (in their current hypothetical form) would never be able to fully substitute the object-oriented approach, as they are simply not fitted for asynchronous multithreading. For example, if I want a thread to be executed in the background, I can write:
t: thread;
...
routine func()
{
    ...
    t = mtlib.thread(...);
    ...
}
...
routine func2()
{
    ...
    t.join();
    ...
}
With synclasses, it would be even simpler. But I see no way to achieve the same with the parallelized functional methods, as they are not meant to represent storable entities like thread objects, jobs etc. That is why I currently view synclasses as the central feature for multithreading in Dao.
P.S. It will be very ironic if the type named future dissappears from the language :)
After some thinking about all these access rules, I stopped on the idea that the simpler is the better. The less feature-local rules we introduce, the easier it will be to use the language overall. The rule "here assignment doesn't work" seems too odd to me, especially since it concerns the case which cannot be syntactically distinguished from other similar cases. One just must know "within this particular functional method(s) I cannot assign outer variables". I suppose the rules like this one would only hinder the users, rather than being of help (I already mentioned Bear service).
The more the code section are closer to "normal" code (and one to each other), the easier it would be to use them and predict the code's behaviour. Thus I believe it would be better to avoid any unnatural, hidden rules whenever possible.
Probably I was a bit too obsessed with simplicity:). I agree synchronous class (not sure if this is the proper name) is a very interesting feature and should be kept (and kept simple).
I agree it's better to keep it closer to normal codes. Again, I was just too obsessed with simplicity. To allow assignment to outer variables, it will be necessary to change the implementation of code section, for now it still seems to be simple. So we can assume this issue has been solved:)
I'm glad you spared synchronous class :) I think the proper name for it is agent or actor : both terms seem to be used for concurrent communicating objects.
First, can you make that the code section for functional methods like each() would not be treated as expression? It just bothers a bit to receive warnings about "assignment inside expression"; it is again a matter of how close the code sections are to usual code.
Now, have you decided what to do with the generating functional... functions? :) I think they might be added to the dao namespace as
array(i: int, j = 0, k = 0, l = 0, m = 0)[I:int,J:int,K:int,L:int,M:int => @T] => array<@T>
list(size: int)[index:int => @T] => list<@T>
list(size: int, init: @T)[index:int,previous:@T => @T] => list<@T>
map(size: int)[index:int => tuple<@K,@V>] => map<@K,@V>
With array() , it would be possible to easily generate multidimensional matrixes:
unityMatrix = dao.array(10, 10)::{[i, j] (i == j)? 1 : 0}
The second list() (and perhaps array() as well) could be employed to handle recurrent dependency ( previous corresponds to previous list element or, if none, init value):
factorialList = dao.list(6, 1)::{(X + 1)*Y}
Map-generating routine could also be of use:
query = QSqlQuery(...)
row = dao.map(query.record.count())::{[i] query.record().fieldName(i), query.value(i).toString()}
Also, I can think of one overloaded method for stream:
writelines(self: stream, lines: int)[line:int => string]
writelines(file: string, lines: int)[line:int => string]

I like the proposed changes (but not the syntax :/) as I too found the old functionals clumsy to use.
To your examples:
lstSum = mtlib.reduce(lst.size(), 0, 2)::{[i, acc] acc + lst[i]}.sum();
index = mtlib.find(lst.size(), 2)::{[i] lst[i] < 0};
Noticed the pattern? Why not just:
lstSum = lst.reduce(0, threads => 2)::{[elem, acc] acc + elem}.sum();
index = lst.find(threads => 2)::{elem < 0};
To me, this would seem much more consistend and natural. If you don't like the named parameters (I am fan of it, but don't like the => syntax of them), another option would be:
lstSum = lst.mtreduce(0, 2)::{[elem, acc] acc + elem}.sum();
index = lst.mtfind(2)::{elem < 0};


If you make an erase method which gets a list of indices, you would not need the erase functional, because of
lst.erase(lst.mtfind(2)::{elem < 0});


Finally, I'm not sure yet (probably I'm too python and c++ biased) but I would prefer a syntax that is closer to "lambdas". Let's say that for example the syntax ::{BLABLA} stands for a "code block", and a code block is a first class object like primitives, functions and objects. Then it seems most natural to me that you pass this code block as a parameter to find and friends:
idxs = lst.find(::{elem < 0});


Now getting even further: from a user's perspective, why do we even need a difference between code blocks and functions?
routine negative(x)
{
  return x < 0
}

idxs = lst.find(negative);
# or as a one-liner:
idxs = lst.find(routine (x) {return x < 0});
# and with some allowed syntactical shortcuts or defaults:
idxs = lst.find(routine {x < 0});

This would even allow to combine it with currying:
routine smaller(x, threshold)
{
  return x < threshold
}

# Pardon, I don't remember the exact syntax of currying and didn't find it by a very quick look in the manual.
negative = smaller{threshold => 0}

idxs = lst.find(negative);

In my opinion, even if in the implementation it is a lot, for the language user it seems all to be "just functions" and all those features seem to be designed to play well together. I know I'm joining late, but what do you guys think of it?
A reply.
lstSum = lst.reduce(0, threads => 2)::{[elem, acc] acc + elem}.sum();
 index = lst.find(threads => 2)::{elem < 0};
I think the parallelized methods are better kept apart, in mtlib , so that all multithreading facilities would be localized in a single module (like now).
If you make an erase method which gets a list of indices, you would not need the erase functional
... but the latter is still noticeably simpler and more efficient :)
idxs = lst.find(::{elem < 0});
The code section must not be a storable object, since it may capture local symbols. What then should happen if such section is executed when those symbols are already gone? Also, the syntax you proposed looks too clumsy to me.
Now getting even further: from a user's perspective, why do we even need a difference between code blocks and functions?
I suppose there is a significant difference which should not be ignored: the code sections are not called like routines and thus may execute much faster. Besides, I really see no point in using routines as arguments of other routines -- again, it's too clumsy (furthermore, it is exactly the reason why functinal methods are provided to begin with).
... for the language user it seems all to be "just functions..."
Actually, the code sections are designed to be treated similar to ordinary blocks of code (not even functions) -- then it becomes easier to achieve a lot of different things with them.
P.S. Sorry for being so critical. I'm just jealously guarding what has been built so far, and you are like a sudden tornado trying to mix all up :) Still, I'm glad you've finally joined the discussion :)
I think we may use Asynchronous class or agent for this. Actor may remind people of the Actor Model, which has a number of general characteristics or principles. I don't want people to get the wrong idea, since this feature will not necessarily realize those characteristics. Actually, there was a actor model based message passing interface in Dao before, but the goal was too big to be practical in a light weighted language.
I was aware of the warning. It was raised because it is necessary to try to compile the code section as an expression or expression list first. What's wrong was that warnings are immediately printed out, so that was no way suppress them, now I changed the way warnings are handled to fix this.

Well I won't say decided, but what I have considered is more or less like what you proposed:). But probably I will put them in std , and add something for string as well. What I didn't consider, but I like in your proposal is this list generating method:
list(size: int, init: @T)[index:int,previous:@T => @T] => list<@T>
well thought:)

For the io.writelines() , I will consider to add an additional parameter in the second one:
writelines(file: string, lines: int, mode='w')[line:int => string]
I think we should change to use enums for file modes.
Your example of mtlib.reduce() and mtlib.find() is a bit outdated, the more updated proposal for parallelized methods is here :)

About whether keeping parallelized methods apart from the single threaded ones or putting them together, and whether use code section or anonymous function and closure (which can capture local variables) to implement these functional methods, I am afraid I have to agree with Nightwalker, for the same reason he explained. I personally also feel putting anonymous function or closure in function parameter list looks more clumsy than putting them behind the call. Compare this,
########
alist.each::{
    do_something( X );
    io.writeln( Y, X );
}
########
alist.each( ::{
    do_something( X );
    io.writeln( Y, X );
} )
########
alist.each( routine(X,Y){
    do_something( X );
    io.writeln( Y, X );
} )
I think it is clear which one look more clean, but maybe it's just subjective:) Another advantage (or maybe disadvantage) is that calling functional function in the currently proposed way, can look more like user defined syntax construct:
routine Loop( n : int )[ i : int ]
{
    for( k = 0 : n-1 ) yield( k );
}

Loop( 10 )::{ [i]
    do_something( i );
}


By the way, I am considering to use mt to replace mtlib .
I guess code section can be considered as a unique design of Dao. Ruby has code block which looks like the code section in Dao, but is implemented as closure which is more computationally expensive.
Indeed, I think we should use enums for the file mode arguments in stream methods: one probably won't need more than two symbols for that, so it should look nice.
BTW, why have you added the dao namespace? What should it contain comparing to std ?
I also had thought about renaming mtlib to mt -- the latter would probably be better considering most other module names.
Hmm, maybe I am misunderstanding the code sections then. I understood that a code section can be executed, can reference "outside" variables, can have parameters and return values. If that is correct, then what exactly is the difference between code sections and closures/lambdas?

I see your point on the code cleanness, even though I think it makes the language a bit more difficult to learn (more magic symbols to keep in mind).

I agree with replacing mtlib by mt.

Is it possible to add more functions to a class or module after it has been declared? If so, I think it would be nice to make the mt module add those parallel functionals to the other classes (like list, map, ...) when imported.

PS: NightWalker I understand you being protective and how I come like a storm... I just like to throw ideas around, maybe they get agreement or maybe not ;)
... then what exactly is the difference between code sections and closures/lambdas?
Well, I suppose the code section may be imagined as a usual block of code pointed by a label (i.e., like for jmp/goto ). Then executing the code section looks like jumping to that label (with the initialized section parameters), executing the code it contains, and returnig back to the caller. The actual implementation may differ, but this seems to be a fitting way of how to view it.
In contrast, closures/lambdas are more like anonymous functions bound to current context.
I see your point on the code cleanness, even though I think it makes the language a bit more difficult to learn (more magic symbols to keep in mind).
Actually, the language should now be easier to learn, as we got rid of nearly a dozen of hard-coded functional methods which used custom syntax within them. Now, the functional methods are all on the applied level and thus have unified syntax.
Is it possible to add more functions to a class or module after it has been declared? If so, I think it would be nice to make the mt module add those parallel functionals to the other classes (like list, map, ...) when imported.
That implies the module hacks the core language :) Not sure it's a good idea.
So do I understand it's right now, that it is like a stackless function call?
And I thought they are bound to the current context too, because of the sum example above. What's going on there?

That implies the module hacks the core language :) Not sure it's a good idea.
I would rather say the module extends the core language, and that sounds like the exact purpose of a module ;)
So do I understand it's right now, that it is like a stackless function call?
Well, it's probably not a function call at all. Not in the meaning of Dao, at least. But it may be treated as such from the user's point of view. It's just something simpler and more efficient :)
And I thought they are bound to the current context too...
They are, of course (I didn't assert they aren't). Actually, the section may be viewed as a part of the current context you refer to (instead of being a separate entity like function).
I would rather say the module extends the core language, and that sounds like the exact purpose of a module ;)
I think extending and intruding are not the same :) But perhaps we could think up something in the middle of them...
It is not a function call. Though like function call, a stack frame is always created for the execution of a code section, this frame is solely used to store a few running states without messing up the parent frame that invokes the execution of the code section. Everything else is the same as the parent frame, especially the stack values. You can say they share the same context, but not in strict sense though, because DaoContext is no longer used. It is not stackless in terms of stack frame, but it is stackless in terms of stack values:)
It is not a function call. Though like function call, a stack frame is always created for the execution of a code section, this frame is solely used to store a few running states without messing up the parent frame that invokes the execution of the code section. Everything else is the same as the parent frame, especially the stack values. You can say they share the same context, but not in strict sense though, because DaoContext is no longer used. It is not stackless in terms of stack frame, but it is stackless in terms of stack values:)
When I said I would have rather agreed to remove the thread type than the (a)synchronous classes, that was actually not just a joke. Recently, I've been seriously thinking if we really need thread . With the appearance of mt.start() (cool thing, didn't think it was possible to fully support it), I became confident that we no more need the thread type -- it can be completely substituted by future :
#thread

t = mt.thread(some_routine{x, y})

#future

f = mt.start::{some_routine(x, y)}

#or even simpler

f = mt.start::{
        #the actual code

}
future can easily absorb any functionality thread has or could have while being higher-level and more promising overall. So, I believe we should repudiate thread and ultimately raise the "asynclass & future" couple to the throne :)
That was exactly my intention when I was thinking about adding mt.start() , and that's why I let it to return a future value:). Also, after the recent development, the previous design of thread (and thread master) look a bit awkward, now it is much better to handle the threads in the background using the thread pool, so this is another reason to remove it.
BTW, why have you added the dao namespace? What should it contain comparing to std ?

Namespace dao isn't new, it is already there as the internal namespace for storing the standard types and built-in objects. It was not named dao until recently. During development of ClangDao and testing it with C++ STL, I found it necessary to name a namespace as dao and handle dao:: prefix specially to better distinguish Dao standard types and the wrapped C/C++ types (e.g. string).

In Dao, std is not a namespace, it is just an object that contains a set of methods.

Change picture:

Choose file:
Visitor Map This site is powered by Dao
Copyright (C) 2009-2013, daovm.net.
Webmaster: admin at daovm dot net