Discussion:
[omniORB] oneway callback gets lost
Serguei Kolos
2007-03-08 23:29:03 UTC
Permalink
Hello

I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6.9-42).
I have a system with server and subscriber which subscribes to the
server and
periodically receives callbacks (defined as oneway method) from the server.
I have set the scanGranularity parameter to 0 on the server and I'm using
default settings on the subscriber.
I have noticed the following problem - if the server sends callback to
subscriber
and then waits for several minutes before sending another one then this new
callback is not properly processed on the subscriber side and in fact it
never goes
to the user code. This is the ouput which appears on the subscriber side
when
the callback should have been received:
omniORB: throw giopStream::CommFailure from
giopStream.cc:838(0,NO,COMM_FAILURE_UnMarshalArguments)
I had a look to the code and found that this happens because the ::recv
function which
is called by the Recv method returns 0 which happens probably because it
is trying to
read from the socket which was already shut down.

If I don't set the scanGranularity to 0 then the problem never occurs.
I have attached the output of both server and subscriber with the
traceLevel set to 35.

Cheers,
Sergei
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Server.debug.gz
Type: application/x-tar
Size: 3657 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20070308/5311487f/Server.debug.tar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Subscriber.debug.gz
Type: application/x-tar
Size: 3461 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20070308/5311487f/Subscriber.debug.tar
Serguei Kolos
2007-04-05 16:31:30 UTC
Permalink
Hi

Can someone confirm please that the issue which I have reported is a
real issue
or I'm just doing something wrong? Does anybody have an idea on how to solve
the problem?

Cheers,
Sergei
Post by Serguei Kolos
Hello
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6.9-42).
I have a system with server and subscriber which subscribes to the
server and
periodically receives callbacks (defined as oneway method) from the server.
I have set the scanGranularity parameter to 0 on the server and I'm using
default settings on the subscriber.
I have noticed the following problem - if the server sends callback to
subscriber
and then waits for several minutes before sending another one then this new
callback is not properly processed on the subscriber side and in fact
it never goes
to the user code. This is the ouput which appears on the subscriber
side when
omniORB: throw giopStream::CommFailure from
giopStream.cc:838(0,NO,COMM_FAILURE_UnMarshalArguments)
I had a look to the code and found that this happens because the
::recv function which
is called by the Recv method returns 0 which happens probably because
it is trying to
read from the socket which was already shut down.
If I don't set the scanGranularity to 0 then the problem never occurs.
I have attached the output of both server and subscriber with the
traceLevel set to 35.
Cheers,
Sergei
Duncan Grisby
2007-04-05 21:31:34 UTC
Permalink
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6.9-42). I have a
system with server and subscriber which subscribes to the server and
periodically receives callbacks (defined as oneway method) from the
server. I have set the scanGranularity parameter to 0 on the server
and I'm using default settings on the subscriber.
I have noticed the following problem - if the server sends callback to
subscriber and then waits for several minutes before sending another
one then this new callback is not properly processed on the subscriber
side and in fact it never goes to the user code. This is the ouput
which appears on the subscriber side when the callback should have
omniORB: throw giopStream::CommFailure from
giopStream.cc:838(0,NO,COMM_FAILURE_UnMarshalArguments)
It's a fundamental limitation of CORBA oneway messages.

The "server" in your case is acting in the role of a client; the
"subscriber" is the server. The client is making oneway requests to the
server, meaning that it just sends messages and never looks for replies.
After a while of sending no messages, the server closes the connection
coming from the client. Because the client is only sending oneway
messages, it doesn't notice the connection closure until the OS fails a
send() call, which can be after a large number of calls if the calls are
small.

With a two-way call, the client would notice the connection closure as
soon as it tried to get a reply, and would retry the call with a new
connection.

The reason you don't see the problem unless you set your client's
scanGranularity to 0 is that by default omniORB clients close idle
connections earlier than servers do, meaning that this situation of a
client not noticing a closed connection never occurs.

This situation is the main reason that CORBA oneways are described as
"unreliable oneways". If it is essential that calls get through, you
shouldn't use oneways, or you should add some higher-level protocol to
confirm that they are received.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Jonathan Biggar
2007-04-05 22:14:51 UTC
Permalink
Post by Duncan Grisby
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6.9-42). I have a
system with server and subscriber which subscribes to the server and
periodically receives callbacks (defined as oneway method) from the
server. I have set the scanGranularity parameter to 0 on the server
and I'm using default settings on the subscriber.
I have noticed the following problem - if the server sends callback to
subscriber and then waits for several minutes before sending another
one then this new callback is not properly processed on the subscriber
side and in fact it never goes to the user code. This is the ouput
which appears on the subscriber side when the callback should have
omniORB: throw giopStream::CommFailure from
giopStream.cc:838(0,NO,COMM_FAILURE_UnMarshalArguments)
It's a fundamental limitation of CORBA oneway messages.
Actually, it's an issue of a design choice in omniORB.
Post by Duncan Grisby
The "server" in your case is acting in the role of a client; the
"subscriber" is the server. The client is making oneway requests to the
server, meaning that it just sends messages and never looks for replies.
After a while of sending no messages, the server closes the connection
coming from the client. Because the client is only sending oneway
messages, it doesn't notice the connection closure until the OS fails a
send() call, which can be after a large number of calls if the calls are
small.
With a two-way call, the client would notice the connection closure as
soon as it tried to get a reply, and would retry the call with a new
connection.
The reason you don't see the problem unless you set your client's
scanGranularity to 0 is that by default omniORB clients close idle
connections earlier than servers do, meaning that this situation of a
client not noticing a closed connection never occurs.
This situation is the main reason that CORBA oneways are described as
"unreliable oneways". If it is essential that calls get through, you
shouldn't use oneways, or you should add some higher-level protocol to
confirm that they are received.
You designed omniORB to not have anything actively listening for input
on the client side of a connection unless there is an outstanding
two-way request. (I'm ignoring bidir connections for now.)

That's a reasonable design choice, but it has the downside of the
behavior you describe. It's not really a flaw in the underlying CORBA
specification.
--
Jon Biggar
Levanta
***@levanta.com
650-403-7252
Serguei Kolos
2007-04-06 11:09:16 UTC
Permalink
Hi Duncan

Thank you for the explanation but I would tend to agree with Jonathan.
Correct me if I'm wrong
but it looks like the Close Connection procedure works fine if it is
initiated by a "client" application
and does not run to completion leaving connection in half-closed state
if it is initiated by a "server".
The worst thing is that an application starts behaving incorrectly when
some omniORB options
(like scanGranularity) are set to valid and reasonable values, i.e. the
functional behavior of an
applications depends on those options.

Cheers,
Sergei
Post by Duncan Grisby
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6.9-42). I have a
system with server and subscriber which subscribes to the server and
periodically receives callbacks (defined as oneway method) from the
server. I have set the scanGranularity parameter to 0 on the server
and I'm using default settings on the subscriber.
I have noticed the following problem - if the server sends callback to
subscriber and then waits for several minutes before sending another
one then this new callback is not properly processed on the subscriber
side and in fact it never goes to the user code. This is the ouput
which appears on the subscriber side when the callback should have
omniORB: throw giopStream::CommFailure from
giopStream.cc:838(0,NO,COMM_FAILURE_UnMarshalArguments)
It's a fundamental limitation of CORBA oneway messages.
The "server" in your case is acting in the role of a client; the
"subscriber" is the server. The client is making oneway requests to the
server, meaning that it just sends messages and never looks for replies.
After a while of sending no messages, the server closes the connection
coming from the client. Because the client is only sending oneway
messages, it doesn't notice the connection closure until the OS fails a
send() call, which can be after a large number of calls if the calls are
small.
With a two-way call, the client would notice the connection closure as
soon as it tried to get a reply, and would retry the call with a new
connection.
The reason you don't see the problem unless you set your client's
scanGranularity to 0 is that by default omniORB clients close idle
connections earlier than servers do, meaning that this situation of a
client not noticing a closed connection never occurs.
This situation is the main reason that CORBA oneways are described as
"unreliable oneways". If it is essential that calls get through, you
shouldn't use oneways, or you should add some higher-level protocol to
confirm that they are received.
Cheers,
Duncan.
Duncan Grisby
2007-04-06 23:22:14 UTC
Permalink
Post by Jonathan Biggar
Post by Duncan Grisby
It's a fundamental limitation of CORBA oneway messages.
Actually, it's an issue of a design choice in omniORB.
Well, yes and no. The fact that it can take quite a long time for
omniORB to notice a connection closure when sending only oneway messages
is the result of a design choice. The fact that a number of oneways can
be lost because they are sent before the client notices a connection has
been closed _is_ a fundamental artifact of the way oneways work. The
client can send a stream of oneways that are all in flight while a
CloseConnection message and TCP closure are going the other way.

You are right that omniORB could notice connection closures sooner than
it does by monitoring the connections it has open. That is only
necessary if the connection is used only (or primarily) for oneways,
since with normal two-way calls the closure is noticed when trying to
receive the reply.

Both of the obvious ways to monitor a connection would have quite a
significant performance impact. One way would be to use a thread that
monitored outgoing connections so it noticed connection closures as soon
as they occurred. That would have the side-effect of triggering for all
expected reply messages, as well as closures, and would therefore waste
time with thread switching on all replies.

The alternative would be for a thread that's about to send a oneway call
to select or poll the connection to see if there are any pending
messages, before sending the oneway. That would have a noticeable impact
on sending performance.

Both those approaches are quite complex and impact performance, and
don't avoid the possibility of losing oneways anyway, merely reduce the
number of oneways than can be lost. It's rather unfortunate that
most operating systems allow send() calls to succeed for quite a while
after the underlying TCP connection has closed.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Loading...