[omniORB] transient exception handler and call timeout problem

Discussion:

Lazar Stricevic

2007-07-26 23:36:32 UTC

Hi everyone,

We've got server/client application that is now using omniORB 4.1.0
Problem that is bothering us is described in messages of Vladislav
Vrtunski (April 2005) and is still present in current version of omniORB.
Here is partial quote of the first posting in the thread describing the

Hi,
We are trying to limit call duration to 2 seconds and if the
timeout is reached we want to repeat the call, hoping that it won't
last that long the second time. This is part of an attempt to make a
fault tolerant system. In order to achieve that, we have set client
call timeout and installed both transient and comm_failure exception
handlers. Each of these handlers is written to allow ORB to retry the
operation once in case exception occurs.
Here is the problem. When the timeout is reached and
TRANSIENT_CallTimedout exception is thrown, our transient exception
handler is called and it returns 1 so that ORB repeats the call.
Immediately after that TRANSIENT_ConnectFailed is thrown without
server side method being called, although we can see in omniORB trace
that the server has accepted the second connection from client. Is
this the expected behavior? We would expect the second call to at
least reach the server method.

Here are the actual postings:
http://www.omniorb-support.com/pipermail/omniorb-list/2005-April/026612.html
http://www.omniorb-support.com/pipermail/omniorb-list/2005-April/026633.html
http://www.omniorb-support.com/pipermail/omniorb-list/2005-May/026668.html
http://www.omniorb-support.com/pipermail/omniorb-list/2005-May/026685.html

Is this behavior considered to be a bug or not?

Anyway, we solved the problem by changing
src/lib/omniORB/orbcore/omniObjRef.cc
Actually, only two lines ("if" section, lines 731 and 758) are commented
out, but it does the trick.
The patch file is attached.

Patch files for omniORB 4.0.5, 4.0.6 and 4.0.7 are available upon request.

Regards,
Lazar

-------------- next part --------------
diff -cr ../omniORB-4.1.0/src/lib/omniORB/orbcore/omniObjRef.cc src/lib/omniORB/orbcore/omniObjRef.cc
*** ../omniORB-4.1.0/src/lib/omniORB/orbcore/omniObjRef.cc Tue Jan 10 13:59:38 2006
--- src/lib/omniORB/orbcore/omniObjRef.cc Mon Jul 9 16:55:59 2007
***************
*** 727,734 ****

if( orbParameters::verifyObjectExistsAndType && do_assert )
_assertExistsAndTypeVerified();
!
! if (!(abs_secs || abs_nanosecs)) {
if (pd_timeout_secs || pd_timeout_nanosecs) {
omni_thread::get_time(&abs_secs,&abs_nanosecs,
pd_timeout_secs, pd_timeout_nanosecs);
--- 727,736 ----

if( orbParameters::verifyObjectExistsAndType && do_assert )
_assertExistsAndTypeVerified();
! ////////////////////////////////////////////////////////////////
! // Change by Lazar Stricevic, invoke(), 09.07.2007 16:50
! // "if" is commented out to allow promptly timer reset
! // if (!(abs_secs || abs_nanosecs)) {
if (pd_timeout_secs || pd_timeout_nanosecs) {
omni_thread::get_time(&abs_secs,&abs_nanosecs,
pd_timeout_secs, pd_timeout_nanosecs);
***************
*** 755,761 ****
orbParameters::clientCallTimeOutPeriod.nanosecs);
}
call_desc.setDeadline(abs_secs,abs_nanosecs);
! }

try{
omni::internalLock->lock();
--- 757,763 ----
orbParameters::clientCallTimeOutPeriod.nanosecs);
}
call_desc.setDeadline(abs_secs,abs_nanosecs);
! // }

try{
omni::internalLock->lock();

Duncan Grisby

2007-07-31 22:43:22 UTC

Permalink

Post by Lazar Stricevic
We've got server/client application that is now using omniORB 4.1.0
Problem that is bothering us is described in messages of Vladislav
Vrtunski (April 2005) and is still present in current version of
omniORB.

[...]

Post by Lazar Stricevic
Is this behavior considered to be a bug or not?

The behaviour is by design, although in your case it is clearly not what
you want. In plenty of cases, you want the call timeout to be an
absolute timeout, whatever happens and however many retries there are.
That's what omniORB's current behaviour gives you. When a call starts, a
deadline is set for it, and if it takes longer than that, the call is
timed out. That's really important for (soft) real time systems, for
example.

Really, what is required is for an exception handler to choose whether
to reset the timeout or not. That would be easy to do, by changing the
signature of the exception handler function to return something with
three values rather than just a boolean. That way, the options would be
to propagate the exception, to retry within the same timeout period, or
to retry with a new timeout period. Unfortunately, making that change
now would break binary compatibility with existing code. The change
can't be made until omniORB 4.2, when it will be acceptable to break
binary compatibility.

What I've done instead is add a new configuration parameter called
resetTimeOutOnRetries, which is false by default. If you set it to 1
(true), the timeout will be reset if a retry occurs, which is what you
want. I've added it in the omni4_1_develop branch. Later this week I'm
going to release omniORB 4.1.1, which will include this change.

Cheers,

Duncan.

--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --

Lazar Stricevic

2007-08-06 22:37:15 UTC

Permalink

Thank you Duncan for the answer and the effort to properly implement the
solution.

Post by Duncan Grisby
...
The behaviour is by design, although in your case it is clearly not what
you want. In plenty of cases, you want the call timeout to be an
absolute timeout, whatever happens and however many retries there are.
That's what omniORB's current behaviour gives you. When a call starts, a
deadline is set for it, and if it takes longer than that, the call is
timed out. That's really important for (soft) real time systems, for
example.
Really, what is required is for an exception handler to choose whether
to reset the timeout or not. That would be easy to do, by changing the
signature of the exception handler function to return something with
three values rather than just a boolean. That way, the options would be
to propagate the exception, to retry within the same timeout period, or
to retry with a new timeout period.

Solution is good. Still, does it make sense to repeat the call with same
timeout period, if that period has already expired, i.e. would it make
more sense to trust the exception handler and reset the timer anyway if
the reason for it was TRANSIENT_CallTimedout ?
Reasoning for this would be that she who writes exception handler should
be responsible enough to know what she's doing, plus it saves you the
trouble of changing the signature of the exception handler function,
breaking compatibility with existing code, etc.

Post by Duncan Grisby
Unfortunately, making that change
now would break binary compatibility with existing code. The change
can't be made until omniORB 4.2, when it will be acceptable to break
binary compatibility.
What I've done instead is add a new configuration parameter called
resetTimeOutOnRetries, which is false by default. If you set it to 1
(true), the timeout will be reset if a retry occurs, which is what you
want. I've added it in the omni4_1_develop branch. Later this week I'm
going to release omniORB 4.1.1, which will include this change.

That will do perfectly for now. Thank you very much!

Regards,
Lazar