|
|
Label: ♦english
♦feature planning
fu
created at Wednesday, 2010-12-22, 08:36:32
24 Replies, 6514 Hits
These days I have been thinking the possibility to bring back the previous features:
Asynchronous Function Call (AFC) and Message Passing Interface (MPI),
but not in the same way, instead a much simpler way.
The original design of MPI attempted to unify concurrent
and distributed computation model under single framework,
which was a bit too ambitious,
and is hard to implement consistently across different platforms.
Additionally, the original design of AFC and MPI does not pay much
consideration to consistency of shared data.
To address such problems, I have come up with a new design that should
be much easier to implement and simpler to use.
The basic idea is to support a specialized class (for the moment, let's call such class as Synchronous Class and its instance object as Synchronous Object), possibly by specifying a special attribute in the class definition or prefix the class name with certain special symbol (or by some other way) as the following:
class [synchronous] SynObject { ... }
Such class will have the following properties:
# or class @SynObject { ... }
The calling to a public method of synchronous objects will return a future value. Retrieving value from such future value object will cause the caller to block until the value becomes available when the method has finished execution. Just as a simple example,
class @Counter
{ var count = 0; public routine Increase(){ count += 1 } routine Decrease(){ count += 1 } } class @Consumer { public routine Update( counter : Counter ){ for( i = 1 : 100 ) counter.Increase(); } } counter = Counter(); c1= Consumer(); c2= Consumer(); fv1 = c1.Update( counter ); fv2 = c2.Update( counter ); fv1.value(); # block until c1.Update( counter ) is done fv2.value(); # block until c2.Update( counter ) is done Because of the requirement on the type of data that are allowed to be passed across the function call boundary, there is clear restrictions on the types of parallel computation that can be implemented by the concurrent model in this proposal. So I am not very sure if a feature based on this proposal can be of general interest to many people. Some feedbacks might help, and are extremely welcome. Comments
Pompei2 commented at Wednesday, 2010-12-22, 10:37:53
With the world-overtaking of multicore processors, such a thing is a very good addition that will hopefully help broaden the use of parallel programming. Thus I think it is a very useful addition.
First of all, is it possible that you have forgotten some "return" statement(s) in your example? Or is it me who has totally missed the point? :) What is the return value of Update? what is the ".value()" method on it? Next, Passing only primitive and syncronous objects sounds logic. Haven't read much more than a few paragraphs and a wikipedia page about the Actor yet though. Hence my question: why not allow any argument that is passed by-value? Also, if Consumer had a second method "Update2", and I call the following:
r1 = c1.Update(counter)
Will they be guaranteed to be called in the same order? In my opinion, they should.r2 = c1.Update2(counter) r3 = c1.Update(counter) And what happens when the return value of c1 is ignored, as in this example:
c1.Update(counter)
might it be that the methods actually never get executed? It seems to me that this statement is not nonsensical, as the first (and/or second) update might change c1's state, or not?c1.Update2(counter) Finally, why couldn't there be synchronous public methods? Wouldn't calling a synchronous public method effectively be the same as calling an asynchronous one and requesting its result right thereafter? Like this:
r1 = c1.Update(counter)
Which would, upon the second line, first execute all pending async methods and then execute the synchronous one? Just like the following:
c1.SynchronousPublicMethod(7)
r1 = c1.Update(counter)
r2 = c1.SameMethodAsAboveButAsync(7) r2.value() Oh, just had an idea about the syntax. Wouldn't it be funny to use "//" as a marker, because in mathematics this symbol means "parallel" :) Lots of questions about this interesting subject. Please consider that even though I once had a lecture about it, I'm quite new to parallel programming and hardly ever used it yet.
Nightwalker commented at Wednesday, 2010-12-22, 12:51:04
Nightwalker modified at Wednesday, 2010-12-22, 16:14:05
[Rewrited]
First, I'm going to try to answer Pompei2's questions:
fu commented at Wednesday, 2010-12-22, 23:44:55
Thanks for the positive feedbacks. It seems Pompei2's questions have been more or less answered by Nightwalker. I will just clarify a bit about the returned value of asynchronous method. Actually there is no need to return something by an asynchronous method, because the returned future value is not prepared by the method, but is prepared by its caller. So it doesn't matter if the method will return something or not, its caller will always associate a future value with the call. When the value() method of a future value is called, it will checked if the associated call has been finished, if not, it will block; if yes, it will return the asynchronous method's returned value or a null value if that method does not return anything.
In fact, it is will allow any argument passed by value, because only primitive types can be passed by value. As Nightwalker pointed out, care need to be taken to prevent primitive types being passed by references. I also think the usage consistency of public methods is more important, supporting synchronous public method will be confusing. After all it is not so bothering to just type the additional .value() :)
fu commented at Thursday, 2010-12-23, 00:34:42
I think this feature can be considered to be part of the language, and is essential for the implementation of the language and VM :)
Pompei2 commented at Thursday, 2010-12-23, 09:56:39
I would make it a built-in too because it adds language functionality (and syntax)
A few things are still not clear to me:
As a language user, I'd really appreciate if the future promise would rather use a getter for the sync, for example:
io.print(fv1)
I know it is just a minor detail, but it makes it "feel" much more like a "future value" to me.# instead of io.print(fv1.value()) Really looking forward to this :-)
Nightwalker commented at Thursday, 2010-12-23, 12:48:58
Nightwalker modified at Thursday, 2010-12-23, 13:13:29
I'll answer your questions again:
Pompei2 commented at Thursday, 2010-12-23, 14:20:10
Thanks. I guess my confusion came from the fact that I thought of the "future" more like a proxy object in RPC frameworks.
I thought the returned value (fv1, fv2) should feel to the user like the real results of the routine calls, so it feels like usual programming. If you think of them as really different objects than the actual result (like QFuture, thanks for the link), more like "watchers", your answers make sense. But it feels way less transparent. Something like this would've been really cool though:
routine @invert(matrix)
In this example I would have expected "hugeInverse" to be the actual returned thing (numeric array, for example) from @invert, but with a very tiny "implementation detail" around it that basically just "waits" for the result when it's accessed for the first time. After that, it would just be like a usual numeric array.{ # long algorithm... return result } hugeMatrix = ..... hugeInverse = invert(hugeMatrix) io.println(hugeInverse) doMoreStuff(hugeInverse) But you have one level of indirection in between. If it should be able to do more (like QFuture), it makes sense. But if it is only for retrieving the result, it is unnecessary from a user's point of view. Also, isn't the "wait" you propose the same as calling ".value()", ignoring its return value, like Limin did in his original example? Just that your name fits the situation better :)
Nightwalker commented at Thursday, 2010-12-23, 14:46:43
Nightwalker modified at Thursday, 2010-12-23, 15:57:05
As for your proposal, it would probably be hard to determine "when it's accessed for the first time". If the value returned should be treated just as usual variable, imagine what immense changes are required to enhance the VM operations with synchronization checks and what overhead it would cause -- almost all DaoVM instructions involve value accessing :) Perhaps there is a simple way (it would be great indeed), but I don't see how to attach that "tiny implementation detail" without huge internal changes :)
fu commented at Thursday, 2010-12-23, 17:26:05
Right, it is really problematic to determine "when it's accessed for the first time". Moreover, accessing the actual value can happen almost anywhere in the program (VM, running time supports ...), but not everywhere the program can be interrupted and resumed safely. So it would be practically impossible.
In Comment 524 :
The call is guaranteed to have happened when the user calls ".value()". So if he never calls .value(), there is no guarantee that the call will be made, neither when it will be. Thus the call will be made "sometime", which can be in 100 years. Or not? There should be no implicit/hidden assumptions.
The only assumption here is that every asynchronous method you called will finish in finite time (or at least the number of methods that will run forever should be less than the number of threads used to execute the asynchronous methods), which is quite reasonable. If this assumption is guaranteed, when you call an asynchronous method, it will guarantee this call will happen and finish in finite time (even if you have to wait for 100 years, that's still finite;) ), regardless if ".value()" is called or not. ".value()" does influence the call/execution of the async method, but it tells the caller to wait.
So do I understand it correctly, that the result of the call can aswell be stored in the future aswell as in one of the passed synchronous objects? And in your example, you chose the latter and thus the two ".value()" calls both return null?
Yes and no, in my example, the result I am interested in is indeed store in the passed synchronous objects. But to get that result, you still have to use ".value()", because you can access synchronous objects only through their public methods, which will be executed asynchronously.In Comment 527 :
BTW, I've just thought that it would be good to equip the future values with the methods isready and wait (in case one doesn't need the returned result, just the synchronization).
Good idea, maybe we can add:
state()=>enum<queued,running,finished,aborted>
wait()
Nightwalker commented at Thursday, 2010-12-23, 19:23:17
Nightwalker modified at Thursday, 2010-12-23, 19:25:21
Yeah, the state method looks more informative, but does the user need to know that much? Is the value already accessible or not -- that is probably what matters here. It's hard for me to imagine how queued and running states could be treated differently. Both mean that the result is not ready. As for aborted , its meaning seems vague to me. The thread is canceled? The method call is removed from the queue? I suppose there shouldn't be such things at all. If asynchronous method has been called, it should be finished sometimes unless an exception was raised -- just like usual routine.
Pompei2 commented at Friday, 2010-12-24, 11:56:39
Too bad it can't be as simple as I hoped :(
I understand your answers about the execution time and they are actually what I expected, but as it was not explicitly stated that the asynchronous method is executed "as soon as possible", I preferred to ask about it :) Nightwalker, You raise an important subject we have not talked about so far: what if the async routine throws an exception? Who will get it? Also, I am of the same opinion as Nightwalker about the <it>state</id> method.
Nightwalker commented at Friday, 2010-12-24, 18:58:03
Nightwalker modified at Friday, 2010-12-24, 19:37:06
I guess exception handling will work like with threads: unhandled exception are intercepted only by the VM itself. This implies that try-rescue blocks and the assertion operator will have no effect here.
fu commented at Saturday, 2010-12-25, 01:43:27
This implementation turns out to be much simpler than that of the previous Asynchronous Function Call and Message Passing Interface.
fu commented at Saturday, 2010-12-25, 02:22:26
There are several possible states for an asynchronous method call:
Nightwalker commented at Saturday, 2010-12-25, 10:49:19
Nightwalker modified at Saturday, 2010-12-25, 12:46:42
Well, I guess the state information may be needed to organize and debug sophisticated multithreading, but isn't it better to use explicit threads in such case? I thought asynchronous methods were meant to abstract from lower-level details completely, while state just throws all that specific technical details to the user ruining the illusion of simplicity.
fvalue = obj.async_method();
result: int; #suppose it's the type of what async_method returns ... result = fvalue.value(); #will it work?
fu commented at Saturday, 2010-12-25, 19:50:56
Right, synchronous class is intended for high-level and easy-to-use parallelization, there is no need to provide that kind of information. For the same reason, are you sure we need isready() this kind of method for future value?
How can safety be achieved if the call may be aborted for "some other reason"? Or do I need to check whether the call has been finished properly each time before accessing the result value?
Probably there is no "some other reason", when I wrote this, I wasn't sure about this. In principle, only exception can abort the call. You do not need to check, I will add some checking in the method for waiting or accessing the result value.
Nightwalker commented at Saturday, 2010-12-25, 21:33:57
Nightwalker modified at Saturday, 2010-12-25, 21:55:20
Thanks for the answers. As for .isready() , it's not something important and may actually be omitted. However, I suppose supporting .wait() is still reasonable, as using .value() for exclusively synchronization purposes (ignoring the value it returns) seems a bit unnatural.
Pompei2 commented at Sunday, 2010-12-26, 00:38:24
Pompei2 modified at Sunday, 2010-12-26, 00:40:58
I think isready might be useful for a program, for example to "do more stuff" as long as the result is not available, instead of blocking while waiting. Imagine for example a game, games have a "main loop" that always redraws the screen. It would definitively not want to block and wait, but rather on every frame check if it's done, if yes, use the result, if not, continue drawing its frames. (And maybe a lovely hourglass cursor :D)
About the global data, it sounds difficult to me, because any routine the async routine call might in itself use global data. Maybe this is what Nightwalker means, I was not sure about what he means. And if the async routine throws an exception, what will happen when the caller accesses the result using the future's ".value" method? My idea would be that he then gets the exception that the async routine has thrown. Does this sound logical to you guys? Finally, I wonder if it is possible to create deadlocks, where one routine waits for the result of another one, which itself waits for the result of the first one. And if yes, if it is possible to detect and prevent them?
fu commented at Sunday, 2010-12-26, 02:20:58
My idea would be that he then gets the exception that the async routine has thrown. Does this sound logical to you guys?
Sounds perfectly logic, actually, this is exactly what I was planning to do:)Regarding how to handle global data, the basic principle should be that asynchronous methods should not modify shared data. Though it is impossible to impose this on wrapped C/C++ functions, I will see if we can at least to prevent shared data being modified in Dao codes, otherwise we will have to allow free access to global data. Based on the above discussions, I decided to support the following methods for the future value type:
Nightwalker commented at Tuesday, 2010-12-28, 11:34:01
Nightwalker modified at Tuesday, 2010-12-28, 11:35:58
I think I have a rather simple idea of how to detect and prevent deadlocks.
fu commented at Thursday, 2010-12-30, 02:11:53
Because each asynchronous call will be executed in a single virtual machine process, which will be allocated to a native thread only when it becomes active. Each VM process will only be waiting for another VM process, if A is waiting for B , there is no way B will be waiting for A directly or indirectly, because no matter what method B called (directly or indirectly) on whatever object, it can not use the VM process running A to run the method called by B , instead it will use a new (current implementation) or unused (future possible optimization) to run it.
Pompei2 commented at Monday, 2011-01-24, 17:59:51
Hi,
just wanted to ask about the state of this, is it now fully implemented? I ask because I saw an demo for it (synchronous_class.dao) in my last repository pull.
fu commented at Tuesday, 2011-01-25, 05:41:48
Not fully implemented yet, but the basic part is done and should be working. What missing is a specification and an enforcement for accessing globals (including function call) that should be allowed in asynchronous methods. I haven't found enough time for this in the past a few weeks.
Nightwalker commented at Sunday, 2011-01-30, 11:55:40
Nightwalker modified at Sunday, 2011-01-30, 12:12:17
I have an idea of how to resolve the situation with accessing globals from within synchronous class's methods, a concept stolen from concurrent programming :)
|
fu: Dao has finally become feature complete! After the recent implementation of communication channel for tasklets, deferred blocks and exception ... (May.18,05:46) fu: A new feature for concurrent programming: tasklet communication channels! I have been looking for ways to improve Dao's support for concurrent programming. The most recent imp ... (May.18,00:35) fu: Dao now supports Go-style panic/exception handling! I recently looked into the panic/ exception handling in the Go programming language (defer- recover), ... (May.07,02:04) deeproot: ... After doing a research, for now I have settled for haxe (rather haxe/ neko target) The most major ... (Apr.23,09:57) |