[omniORB] Source of COMM_FAILURE_MarshalArguments on narrow

Discussion:

Jeff Frontz

2013-07-03 18:53:53 UTC

We're running an app that has client/server processes co-resident on a
virtual server. It's been running fine for years.

We recently made a slight change to one of our (many) applications that has
changed the timing of when the client attempts to contact (read "narrow on
an object serviced by") the server (the narrow call was moved earlier in
the lifetime of the client, but still long after the server was activated).

Every so often, a narrow on an object will throw
a COMM_FAILURE_MarshalArguments (1096024067) exception. After reviewing
the exception trace (which I've unfortunately deleted and am trying to
reproduce), I poked through the omniORB source (4.1.2) and the initial
obvious source is a timeout -- except all of our timeouts are set to "0"
(forever). Looking further, it seems the next likely culprit is send(2)
experiencing some sort of a (transient?) error. Since these processes are
on the same machine, I can't imagine there being any sort of intramachine
congestion in the TCP stack. There doesn't seem to be any obvious
processor/resource overload (per sar) -- that other (different application)
clients simultaneously running on the same machine continue to execute
perfectly would seem to refute any obvious resource issue.

Are there other less likely sources for this exception?

Thanks,
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20130703/aa7f6d34/attachment.html>

Duncan Grisby

2013-07-18 11:11:16 UTC

Permalink

Every so often, a narrow on an object will throw a
COMM_FAILURE_MarshalArguments (1096024067) exception. After reviewing
the exception trace (which I've unfortunately deleted and am trying to
reproduce), I poked through the omniORB source (4.1.2) and the initial
obvious source is a timeout -- except all of our timeouts are set to
"0" (forever). Looking further, it seems the next likely culprit is
send(2) experiencing some sort of a (transient?) error.

COMM_FAILURE_MarshalArguments means that the communication failed while
omniORB was sending a request message. As you say, the most likely
reason for that is the send() system call returning an error. Quite
aside from the fact that you haven't set any call timeouts, if you did
get a timeout, you would get a TRANSIENT exception, not a COMM_FAILURE.

Since these processes are on the same machine, I can't imagine there
being any sort of intramachine congestion in the TCP stack. There
doesn't seem to be any obvious processor/resource overload (per sar)
-- that other (different application) clients simultaneously running
on the same machine continue to execute perfectly would seem to refute
any obvious resource issue.

I have seen occasional errors from send() even for in-machine calls on
some platforms. What platform are you using?

Cheers,

Duncan.

--
-- Duncan Grisby --
-- duncan at grisby.org --
-- http://www.grisby.org --

Jeff Frontz

2013-07-18 12:57:13 UTC

Permalink

We're on linux 2.6 (Fedora Core 7 libraries/packages running on a 2.6.18
kernel).

Post by Duncan Grisby
I have seen occasional errors from send() even for in-machine calls on
some platforms. What platform are you using?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20130718/2e5e47dd/attachment.html>