xAMp FAQ


When a process dies (CTRL C, for instance) others processes do not see it until a new operation is done on the group where this process was a participant. Is anything to use to get this information faster?

There is nothing to get this information faster. However, we are planning to build a mechanism in the interface which "pings" periodically all the processes attached to it, thus providing the pretended functionality. Read the TODO file to see what is to be done.

When a process dies (CTRL C, for instance) and an operation is done on the group where this process was a participant the following message appears:


 leMbxSend: Cannot send to destination Mailbox
 Warning: User Error:
     (PMM) Error sending to mailbox 

Is there any problem? what is the real reason for this?

This is just a warning message which was left in the daemon process, and does not mean any problem. The message is printed when xAMp fails to communicate with a process, and is interpreted as a process crash. This makes xAMp to clear information related to the process (open groups, objects, etc.) and disseminate the changes.

We will eventually take out these warning messages in a future version of xAMp.

Is it possible to use the same object id by different processes?

No, if both processes try to open (join) the same group. When a processes starts to open a group, xAMp checks the unicity of the given object id. The restriction is, in fact, the possibility to use the same object id in the same group.

Is it mandatory to insert an oid in a group before sending messages to this group?

Yes. No Out of Group senders are allowed.

I did not understand very well the role played by msgHander(), and primitives like xampMsgJoin(), Ckeck and Leave.

The join, check and leave functions of mgs are not important for the user. They are there just to provide a means of knowing the site level view of the system. The mgs is the protocol that maintains the information related with the sites where xAMp is running. If a user wants to know which sites are there it should do a joinMgs(). When new sites enter or when sites leave the system the information handled by mgs is updated and the mgsHandler is called to provide the new site information. Note that applications do not need to call mgsJoin in order to run the mgs protocol; this protocol is run automaticaly by xAMp.

The mgsCheck() routine forces xAMp to check for alive sites (this is only usefull if there is no message traffic for a long period, otherwise the site view is automatically updated).

The mgsLeave() inhibits the interface to call the mgsHandler function. This should be called when the application is no more interested in knowing the site level view/modifications.

The reference manual says: "reliableSend() reliably sends the specified message to all of the reachable members of the group or group subset, even on sender failure".

What does "all reachable members" means?

Suppose that one member in the group is not able to receive but still alive (registred in the current view of the group), then all correct recipients will receive the message. The failed (not reachable) recipient will not receive the message but all the other (correct recipients) will receive it. In this case, since one of the members in the view has failed, all the correct members (the ones that received the message) will receive a new view indication AFTER the message.

What does "even on sender failure" means?

It means that the sender may fail during message transmission. However, at least one of the recipients must have received the message before the sender failed. The protocol assures that one of the participants that received the message will take the place of the original sender, and complete the execution of the transmission. If the message was completely lost, then (of course) no members will receive it.

In reliableSend() is FIFO ordering valid with respect to multiple senders?

No. Multiple messages sent by several senders could be received in different orders. Thus, FIFO ordering is only valid w.r.t. one sender only.

In atomicSend() what happens if one of the recipients is not reachable? Does the other (correct) recipients receive the message?

Yes. This works as for reliable multicast. If one of the members fails it will be removed from the group view, but the correct members will receive the message anyway. You must realize that with our failure assumptions, the "not able to receive", means the guy is "dead" from that moment on, that is, it will be put out of the group, and must shut down (this is always possible, since we assume no partitions).

Is TOTAL ordering valid for multiple senders?

Yes if total causal order is meant. Only the delta QOS guarantees total order with respect to time.

I don't understand which are the REAL differences between atomic and reliable multicast.

In fact, regarding to recipient failures and generation of new views there is no difference between the two QOS. The effective difference is that atomic guarantees total order while reliable only guarantees FIFO order. In practice it is very difficult to "observe" this difference because it only happens when omission failures occur (and this is rare, unless simulated).

In deltaSend(), what happens if one of the participants receives a message after the time window? In this case is it true that nobody will receive the message?

This is not true. If a message is received after the time window it is discarded but no action is taken to prevent other sites from deliver it.

We didn't pay much attention to this because late messages only occur when conditions are violated. The time window is chosen taking into account the maximum possible time a correct message can take to be transmitted. Therefore, only failed sites can receive "late" messages: xAMp will map message (too large) delays to omission failures and mark non-responding sites as failed (note that delta QOS is a two phase protocol).

To be continued...


Last change in:
webnav@di.fc.ul.pt