Discussion:
[omniORB] Stuck in socket recv() ?
Wernke zur Borg
2007-01-10 18:47:23 UTC
Permalink
Hi @ all,

I have a 1:1 client/server application, i.e. one Java client, one
omniORB server. The server must get aware of client failures
immediately, therefore I have defined a callback() method on the client,
which the server calls right at the beginning of a lengthy session. The
Java client simply blocks the call and only returns when the session is
to be closed. This way the server also recognises client crashes with a
COMM_FAILURE.

The scheme works nearly perfectly, but only once in a while the
callback() does not return with the expected failure when the client is
killed. A stack trace is given below, it shows that the call is blocked
in the socket recv(), even though the TCP connection cannot exist any
more, as the other side was killed.

I did a netstat -a on both sides and, strange enough, the TCP connection
is shown as ESTABLISHED on the omniORB server side and is not listed at
all on the Java side. How is this possible?

This is omniORB 4.0.6 on Solaris talking to Sun Java 1.4.2_06 on Windows
XP.

Thanks for any hints.
Wernke

***@9 ***@3 <3> where
[1] _so_recv(0x18, 0x100c4b0, 0x2000, 0x0, 0x0, 0x8000), at 0xfd39c23c
=>[2] omni::tcpConnection::Recv(this = ???, buf = ???, sz = ???,
deadline_secs = ???, deadline_nanosecs = ???) (optimized), at 0xff2c31d4
(line ~297) in "tcpConnection.cc"
[3] omni::giopStream::inputMessage(this = ???) (optimized), at
0xff281434 (line ~825) in "giopStream.cc"
[4] omni::giopImpl12::inputReplyBegin(g = ???, unmarshalHeader = ???)
(optimized), at 0xff299e38 (line ~647) in "giopImpl12.cc"
[5] omni::giopImpl12::inputMessageBegin(g = ???, unmarshalHeader =
???) (optimized), at 0xff299fcc (line ~690) in "giopImpl12.cc"
[6] omni::GIOP_C::ReceiveReply(this = ???) (optimized), at 0xff288a64
(line ~165) in "GIOP_C.cc"
[7] omniRemoteIdentity::dispatch(this = ???, call_desc = CLASS)
(optimized), at 0xff267034 (line ~211) in "remoteIdentity.cc"
[8] omniObjRef::_invoke(this = ???, call_desc = CLASS, do_assert =
???) (optimized), at 0xff246978 (line ~800) in "omniObjRef.cc"
[9] Otif::_objref_DatabaseListener::callback(this = 0x1006528), line
3910 in "otif.C"
[10] DbClientWatcher::start(this = 0xf774f0), line 33 in
"DbClientWatcher.C"
[11] Dispatcher::start(this = 0xf7fd78, semaphore = 0xffbed17c, agent
= 0xf774f0, listener = (nil)), line 591 in "Dispatcher.C"
[12] OCSMgr::startDispatcher(args = 0xf75e40), line 743 in "OCSMgr.C"
***@9 ***@3 <4>
Duncan Grisby
2007-01-18 16:32:51 UTC
Permalink
Post by Wernke zur Borg
I have a 1:1 client/server application, i.e. one Java client, one
omniORB server. The server must get aware of client failures
immediately, therefore I have defined a callback() method on the client,
which the server calls right at the beginning of a lengthy session. The
Java client simply blocks the call and only returns when the session is
to be closed. This way the server also recognises client crashes with a
COMM_FAILURE.
The scheme works nearly perfectly, but only once in a while the
callback() does not return with the expected failure when the client is
killed. A stack trace is given below, it shows that the call is blocked
in the socket recv(), even though the TCP connection cannot exist any
more, as the other side was killed.
I did a netstat -a on both sides and, strange enough, the TCP connection
is shown as ESTABLISHED on the omniORB server side and is not listed at
all on the Java side. How is this possible?
The issue is that TCP connections don't inherently notice if one side
vanishes. omniORB doesn't enable TCP keep-alives, and they're generally
not very useful anyway, so it's entirely possible for the OS to not know
that the other end of a TCP connection has gone. It will only notice if
an attempt is made to send some data across the connection.

The usual way to implement the kind of thing you're talking about is to
periodically ping one way or the other so that data is actually
transferred between the processes.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Continue reading on narkive:
Loading...