( @(user) ) Login/Signup Logout
The Dao Programming Language
for Scripting and Computing

Home Documentation Download Blog Forum Projects Demo
Latest News
Label: ♦english ♦feature planning

[514] Synchronous class for concurrent programming

Comment
These days I have been thinking the possibility to bring back the previous features: Asynchronous Function Call (AFC) and Message Passing Interface (MPI), but not in the same way, instead a much simpler way. The original design of MPI attempted to unify concurrent and distributed computation model under single framework, which was a bit too ambitious, and is hard to implement consistently across different platforms. Additionally, the original design of AFC and MPI does not pay much consideration to consistency of shared data. To address such problems, I have come up with a new design that should be much easier to implement and simpler to use.

The basic idea is to support a specialized class (for the moment, let's call such class as Synchronous Class and its instance object as Synchronous Object), possibly by specifying a special attribute in the class definition or prefix the class name with certain special symbol (or by some other way) as the following:
class [synchronous] SynObject { ... }
# or
class @SynObject { ... }
Such class will have the following properties:
  • All its member variables are private or protected;
  • All its public methods are asynchronous methods, namely, when one such method is called, the control will be returned immediately to its caller, and this method might be executed immediately and concurrently with its caller, or might be caller at a later time.
  • All its private or protected methods are normal methods;
  • Only simple data types and synchronous objects can be passed across the function call boundary (parameters and returned value) of public methods.
Under the Actor Model, the instance object of such class is essentially an actor, and calling to its public method is essentially equivalent to sending the object a message. Moreover, different objects/actors can handle their messages concurrently, and each object only handles one message at one time. With all these properties, data consistency can be guaranteed by the fact that only synchronous objects can be used as shared data, and the modification to its internal data has to be done through its public methods, whose execution model will guarantee its data will not be accessed by multiple competing threads at the same time.

The calling to a public method of synchronous objects will return a future value. Retrieving value from such future value object will cause the caller to block until the value becomes available when the method has finished execution.

Just as a simple example,
class @Counter
{
	var count = 0;

	public
	routine Increase(){ count += 1 }
	routine Decrease(){ count += 1 }
}
class @Consumer
{
	public
	routine Update( counter : Counter ){
		for( i = 1 : 100 ) counter.Increase();
	}
}
counter = Counter();
c1= Consumer();
c2= Consumer();
fv1 = c1.Update( counter );
fv2 = c2.Update( counter );
fv1.value(); # block until c1.Update( counter ) is done
fv2.value(); # block until c2.Update( counter ) is done

Because of the requirement on the type of data that are allowed to be passed across the function call boundary, there is clear restrictions on the types of parallel computation that can be implemented by the concurrent model in this proposal. So I am not very sure if a feature based on this proposal can be of general interest to many people. Some feedbacks might help, and are extremely welcome.
Comments
With the world-overtaking of multicore processors, such a thing is a very good addition that will hopefully help broaden the use of parallel programming. Thus I think it is a very useful addition.

First of all, is it possible that you have forgotten some "return" statement(s) in your example? Or is it me who has totally missed the point? :) What is the return value of Update? what is the ".value()" method on it?

Next, Passing only primitive and syncronous objects sounds logic. Haven't read much more than a few paragraphs and a wikipedia page about the Actor yet though. Hence my question: why not allow any argument that is passed by-value?

Also, if Consumer had a second method "Update2", and I call the following:
r1 = c1.Update(counter)
r2 = c1.Update2(counter)
r3 = c1.Update(counter)
Will they be guaranteed to be called in the same order? In my opinion, they should.

And what happens when the return value of c1 is ignored, as in this example:
c1.Update(counter)
c1.Update2(counter)
might it be that the methods actually never get executed? It seems to me that this statement is not nonsensical, as the first (and/or second) update might change c1's state, or not?

Finally, why couldn't there be synchronous public methods? Wouldn't calling a synchronous public method effectively be the same as calling an asynchronous one and requesting its result right thereafter? Like this:
r1 = c1.Update(counter)
c1.SynchronousPublicMethod(7)
Which would, upon the second line, first execute all pending async methods and then execute the synchronous one? Just like the following:
r1 = c1.Update(counter)
r2 = c1.SameMethodAsAboveButAsync(7)
r2.value()


Oh, just had an idea about the syntax. Wouldn't it be funny to use "//" as a marker, because in mathematics this symbol means "parallel" :)

Lots of questions about this interesting subject. Please consider that even though I once had a lecture about it, I'm quite new to parallel programming and hardly ever used it yet.
[Rewrited] First, I'm going to try to answer Pompei2's questions:
  • There indeed should be something "return"-like in asynchronous functions;
  • Argument passed by value may actually be of primitive type only in Dao, hence they are already included in the "passable category" Limin stated. However, primitive types shouldn't be passed by reference -- that's true, it should be taken into account during the implementation;
  • In your first example (calling several methods), the calling order is guaranteed like with a usual object;
  • In your second example (ignoring the returned value), the methods will be executed regardless -- just because they are called. They won't be joined with the main thread though, as there is no call to the value method of c1 ;
  • In your third and fourth examples (synchronous public methods), "first execute all pending async methods and then execute the synchronous one" is essentially the same as if SynchronousPublicMethod would be asynchronous -- that's why there is probably no real need for such thing (consistency in the usage of methods is probably better);
But let's proceed to what I think of all this. Well, I think the idea is excellent! Well-considered, well-formed and quite useful. No messing with threads and their synchronization -- just a high-level automatic multithreading.
As for the syntax, the one shown in the example ( class @name ) seems good enough -- similar to coroutines.
BTW, won't the proposed feature be based on platform-dependent multithreading facility and thus match the category "should not be built-in"? :)
Thanks for the positive feedbacks. It seems Pompei2's questions have been more or less answered by Nightwalker. I will just clarify a bit about the returned value of asynchronous method. Actually there is no need to return something by an asynchronous method, because the returned future value is not prepared by the method, but is prepared by its caller. So it doesn't matter if the method will return something or not, its caller will always associate a future value with the call. When the value() method of a future value is called, it will checked if the associated call has been finished, if not, it will block; if yes, it will return the asynchronous method's returned value or a null value if that method does not return anything.

In fact, it is will allow any argument passed by value, because only primitive types can be passed by value. As Nightwalker pointed out, care need to be taken to prevent primitive types being passed by references.

I also think the usage consistency of public methods is more important, supporting synchronous public method will be confusing. After all it is not so bothering to just type the additional .value() :)
I think this feature can be considered to be part of the language, and is essential for the implementation of the language and VM :)
I would make it a built-in too because it adds language functionality (and syntax)
A few things are still not clear to me:
  1. why use the method .value() and not just "accessing that variable"? Too hard to implement?
  2. The call is guaranteed to have happened when the user calls ".value()". So if he never calls .value(), there is no guarantee that the call will be made, neither when it will be. Thus the call will be made "sometime", which can be in 100 years. Or not? There should be no implicit/hidden assumptions. This could even be an optimization^^
So do I understand it correctly, that the result of the call can aswell be stored in the future aswell as in one of the passed synchronous objects? And in your example, you chose the latter and thus the two ".value()" calls both return null?

As a language user, I'd really appreciate if the future promise would rather use a getter for the sync, for example:
io.print(fv1)
# instead of
io.print(fv1.value())
I know it is just a minor detail, but it makes it "feel" much more like a "future value" to me.

Really looking forward to this :-)
I'll answer your questions again:
  1. I guess "just accessing the variable" implementation would indeed be too complex and cumbersome, at least straightforward one. Besides, how then you would tell Dao that, instead of accessing the actual value, you want to save the future value (value holder) in order to retrieve the actual value later?
  2. The call is queried when asynchronous method is actually called. It may start running instantly (if no other async methods of particular object are running) or later (after all previously called methods have finished). The value method is used to synchronize the method with the current thread. That is, calling the value , you order Dao to wait until the method associated with that future value is finished and then get what it returns.
Regarding your proposal of implicit getter: again, how would you distinguish whether you want to store/transfer the value holder for future use or to access the actual value immediately?
Also, you may refer to Qt's QtConcurrent and QFuture documentation for more information.
BTW, I've just thought that it would be good to equip the future values with the methods isready and wait (in case one doesn't need the returned result, just the synchronization).
Thanks. I guess my confusion came from the fact that I thought of the "future" more like a proxy object in RPC frameworks.

I thought the returned value (fv1, fv2) should feel to the user like the real results of the routine calls, so it feels like usual programming. If you think of them as really different objects than the actual result (like QFuture, thanks for the link), more like "watchers", your answers make sense. But it feels way less transparent.

Something like this would've been really cool though:
routine @invert(matrix)
{
    # long algorithm...
    return result
}

hugeMatrix = .....
hugeInverse  = invert(hugeMatrix)
io.println(hugeInverse)
doMoreStuff(hugeInverse)
In this example I would have expected "hugeInverse" to be the actual returned thing (numeric array, for example) from @invert, but with a very tiny "implementation detail" around it that basically just "waits" for the result when it's accessed for the first time. After that, it would just be like a usual numeric array.

But you have one level of indirection in between. If it should be able to do more (like QFuture), it makes sense. But if it is only for retrieving the result, it is unnecessary from a user's point of view.

Also, isn't the "wait" you propose the same as calling ".value()", ignoring its return value, like Limin did in his original example? Just that your name fits the situation better :)
As for your proposal, it would probably be hard to determine "when it's accessed for the first time". If the value returned should be treated just as usual variable, imagine what immense changes are required to enhance the VM operations with synchronization checks and what overhead it would cause -- almost all DaoVM instructions involve value accessing :) Perhaps there is a simple way (it would be great indeed), but I don't see how to attach that "tiny implementation detail" without huge internal changes :)
Right, it is really problematic to determine "when it's accessed for the first time". Moreover, accessing the actual value can happen almost anywhere in the program (VM, running time supports ...), but not everywhere the program can be interrupted and resumed safely. So it would be practically impossible.

In Comment 524 :
The call is guaranteed to have happened when the user calls ".value()". So if he never calls .value(), there is no guarantee that the call will be made, neither when it will be. Thus the call will be made "sometime", which can be in 100 years. Or not? There should be no implicit/hidden assumptions.
The only assumption here is that every asynchronous method you called will finish in finite time (or at least the number of methods that will run forever should be less than the number of threads used to execute the asynchronous methods), which is quite reasonable. If this assumption is guaranteed, when you call an asynchronous method, it will guarantee this call will happen and finish in finite time (even if you have to wait for 100 years, that's still finite;) ), regardless if ".value()" is called or not. ".value()" does influence the call/execution of the async method, but it tells the caller to wait.
So do I understand it correctly, that the result of the call can aswell be stored in the future aswell as in one of the passed synchronous objects? And in your example, you chose the latter and thus the two ".value()" calls both return null?
Yes and no, in my example, the result I am interested in is indeed store in the passed synchronous objects. But to get that result, you still have to use ".value()", because you can access synchronous objects only through their public methods, which will be executed asynchronously.

In Comment 527 :
BTW, I've just thought that it would be good to equip the future values with the methods isready and wait (in case one doesn't need the returned result, just the synchronization).
Good idea, maybe we can add:
state()=>enum<queued,running,finished,aborted>
wait()

Yeah, the state method looks more informative, but does the user need to know that much? Is the value already accessible or not -- that is probably what matters here. It's hard for me to imagine how queued and running states could be treated differently. Both mean that the result is not ready. As for aborted , its meaning seems vague to me. The thread is canceled? The method call is removed from the queue? I suppose there shouldn't be such things at all. If asynchronous method has been called, it should be finished sometimes unless an exception was raised -- just like usual routine.
Too bad it can't be as simple as I hoped :(

I understand your answers about the execution time and they are actually what I expected, but as it was not explicitly stated that the asynchronous method is executed "as soon as possible", I preferred to ask about it :)

Nightwalker, You raise an important subject we have not talked about so far: what if the async routine throws an exception? Who will get it?

Also, I am of the same opinion as Nightwalker about the <it>state</id> method.
I guess exception handling will work like with threads: unhandled exception are intercepted only by the VM itself. This implies that try-rescue blocks and the assertion operator will have no effect here.
This implementation turns out to be much simpler than that of the previous Asynchronous Function Call and Message Passing Interface.
There are several possible states for an asynchronous method call:
  • queued : the call has entered in the execution queue, and never executed before;
  • running : the call is being executed;
  • paused/suspended : the call is suspended to wait for another call to finish;
  • finished : the call is successfully finished;
  • aborted : the call is aborted due to exceptions or some other reason.
I guess in certain situation, this information can be useful. But it does look impropriate to add .state() to the future value type, because it is not the state of the future value , but the state of the VM process associated with the future value .
Well, I guess the state information may be needed to organize and debug sophisticated multithreading, but isn't it better to use explicit threads in such case? I thought asynchronous methods were meant to abstract from lower-level details completely, while state just throws all that specific technical details to the user ruining the illusion of simplicity.
I, for instance, have already become confused :) How can safety be achieved if the call may be aborted for "some other reason"? Or do I need to check whether the call has been finished properly each time before accessing the result value?
BTW, the feature needs really thorough testing. There shouldn't be the situations like what I encountered with threads. That's why I want to ask certain questions in advance:
  • Is passing a value of primitive type by reference to async method prohibited?
  • Is accessing global data prohibited (or somehow regulated)?
  • Is passing a class/interface, a routine/currying or a generic type ( any ) value prohibited?
  • Is the implementation compatible with mtlib ?
  • Will the VM wait for unfinished async methods if the end of script is reached?
  • Is the use of the .value() result type-checked during the compilation?
Regarding the type-checking:
fvalue = obj.async_method();
result: int; #suppose it's the type of what async_method returns

...
result = fvalue.value(); #will it work?

Right, synchronous class is intended for high-level and easy-to-use parallelization, there is no need to provide that kind of information. For the same reason, are you sure we need isready() this kind of method for future value?

How can safety be achieved if the call may be aborted for "some other reason"? Or do I need to check whether the call has been finished properly each time before accessing the result value?
Probably there is no "some other reason", when I wrote this, I wasn't sure about this. In principle, only exception can abort the call. You do not need to check, I will add some checking in the method for waiting or accessing the result value.
  • Is passing a value of primitive type by reference to async method prohibited?
    Should be, but not implemented yet.
  • Is accessing global data prohibited (or somehow regulated)?
    Now global variable accessing is prohibited. But global constant is current allowed, which is problematic for classe and namespace objects, because this will allow indirect accessing static class members and global variables. I am considering to prohibit such constants as well.
  • Is passing a class/interface, a routine/currying or a generic type ("any") value prohibited?
    Should be, but not implemented yet.
  • Is the implementation compatible with "mtlib"?
    Not sure what do you mean compatible? Do you mean the possibility to use mutex and condition variable etc. in synchronous class? For this, I think yes.
  • Will the VM wait for unfinished async methods if the end of script is reached?
    Yes.
  • Is the use of the ".value()" result type-checked during the compilation?
    Yes.

Thanks for the answers. As for .isready() , it's not something important and may actually be omitted. However, I suppose supporting .wait() is still reasonable, as using .value() for exclusively synchronization purposes (ignoring the value it returns) seems a bit unnatural.
Also, I've been thinking about the global data usage. Prohibiting it at all would render asynchronous classes useless, as they wouldn't be able to rely even on built-in routines! Currently I see only one simple solution: allow to access global data freely and thus shift the problem onto the user :)
I think isready might be useful for a program, for example to "do more stuff" as long as the result is not available, instead of blocking while waiting. Imagine for example a game, games have a "main loop" that always redraws the screen. It would definitively not want to block and wait, but rather on every frame check if it's done, if yes, use the result, if not, continue drawing its frames. (And maybe a lovely hourglass cursor :D)

About the global data, it sounds difficult to me, because any routine the async routine call might in itself use global data. Maybe this is what Nightwalker means, I was not sure about what he means.

And if the async routine throws an exception, what will happen when the caller accesses the result using the future's ".value" method? My idea would be that he then gets the exception that the async routine has thrown. Does this sound logical to you guys?

Finally, I wonder if it is possible to create deadlocks, where one routine waits for the result of another one, which itself waits for the result of the first one. And if yes, if it is possible to detect and prevent them?
My idea would be that he then gets the exception that the async routine has thrown. Does this sound logical to you guys?
Sounds perfectly logic, actually, this is exactly what I was planning to do:)

Regarding how to handle global data, the basic principle should be that asynchronous methods should not modify shared data. Though it is impossible to impose this on wrapped C/C++ functions, I will see if we can at least to prevent shared data being modified in Dao codes, otherwise we will have to allow free access to global data.

Based on the above discussions, I decided to support the following methods for the future value type:
  • available() : equivalent to the suggested isready() , but method name with single word is preferred;
  • wait() : wait until the result value become available;
  • value() : get the result value, if not yet available, wait until it becomes.

I think I have a rather simple idea of how to detect and prevent deadlocks.
An internal lock map in form thread => thread may be used; each time .value() or .wait() are called, new pair waiting_thread => referred_thread is inserted into the map, and when the corresponding threads are joined, that pair is removed. Thus, if upon adding A => B there is already B => A in the map, a deadlock obviously takes place. More complicated cases (like A => B, B => C, C => A ) could be detected as well if necessary.
As for what to do with deadlocks, I suppose if one has been detected, the value/wait method should return immediately raising an exception.
What do you think about this?
Because each asynchronous call will be executed in a single virtual machine process, which will be allocated to a native thread only when it becomes active. Each VM process will only be waiting for another VM process, if A is waiting for B , there is no way B will be waiting for A directly or indirectly, because no matter what method B called (directly or indirectly) on whatever object, it can not use the VM process running A to run the method called by B , instead it will use a new (current implementation) or unused (future possible optimization) to run it.
Hi,
just wanted to ask about the state of this, is it now fully implemented? I ask because I saw an demo for it (synchronous_class.dao) in my last repository pull.
Not fully implemented yet, but the basic part is done and should be working. What missing is a specification and an enforcement for accessing globals (including function call) that should be allowed in asynchronous methods. I haven't found enough time for this in the past a few weeks.
I have an idea of how to resolve the situation with accessing globals from within synchronous class's methods, a concept stolen from concurrent programming :)
One way or another, synchronous classes need to have access to global data; otherwise they will be nearly useless -- can one do much without having access to even built-in global routines? Thus I think the problem lies in regulating this access somehow, in order to guarantee safety. And here we could use the concept of monitor .
In concurrent programming, monitor refers to an object or module which can be used by only one thread at a time. It's quite similar to how mutex works: if one thread runs any of the monitor's methods, all other threads wait until the monitor is "released". For more information, see "Monitor (synchronization)" in Wikipedia.
So, I think this monitor thing can be adopted for our case in form of fully static class (singleton object, in other words). It could be used to supervise the access to global resources used by the synchronous classes, as well as to organize thread-safe functions for use with ordinary threads of mtlib . Monitors could either be made the only global data (perhaps along with constants) accessible by synchronous classes, or be allowed only to be passed to their methods.
There is probably only one disadvantage of using monitors: one will have to wrap any global data required by the synchronous classes, including obviously thread-safe one. But otherwise it seems like a nice "super-mutex" solution to the synchronization issue :) So, what do you think?

Change picture:

Choose file:
Visitor Map This site is powered by Dao
Copyright (C) 2009-2013, daovm.net.
Webmaster: admin at daovm dot net