Discussion:
[omniORB] Delayed COMM_FAILURE on oneway call
Martin Ba
2013-06-13 08:22:34 UTC
Permalink
Hi experts!

We had a communication failure recently on a direct LAN connection with
omniORB: (both sides use omniORB)

* One oneway callback worked without err, and immediately (~ms) after
this working callback, ->
* A oneway callback of ours failed with a COMM_FAILURE (unfortunately I
haven't logged the minor reason).
* The COMM_FAILURE happened 37 seconds after the call was invoked
* During these 37 secs, the other side did continuously make normal
synchronous calls and these all worked and were recorded in the logs.


Is it possible to make an educated guess as to what might be a possible
cause or at least where to look for problems?


thanks!

- Martin
Martin Ba
2013-06-27 10:50:04 UTC
Permalink
Tracking this further -

I have been able to get additional logging, and it appears that the
Receive for the "inputMessage" (whatever that is exactly) times out.

The error raised is:
+ + + +
omniORB: (54) 2013-06-26 11:04:48.851000: Dispatching remote call
'Start' to: root<10588> (active)
...
omniORB: (54) 2013-06-26 11:04:48.857000: LocateRequest to remote:
root/BiDirPOA<2>
... ...
omniORB: (54) 2013-06-26 11:05:18.689000: Error in network receive
(start of message): giop:tcp:5x.5x.2.1x9:4303
omniORB: (54) 2013-06-26 11:05:18.689000: throw giopStream::CommFailure
from giopStream.cc:874(0,MAYBE,COMM_FAILURE_WaitingForReply)
...
omniORB: (54) 2013-06-26 11:05:18.700000: Return from remote call
'Start' to: root<10588> (active)
+ + + +

Unfortunately, we only had the log level at 15 (instead of the better
25), so I don't have the tracks for any possible "Client attempt to
connect", "Client opened connection", "sendChunk: to ", "inputMessage:
from ".

I also have logging from the other end, but there (also traceLevel 15)
there is no trace of this call attempt whatsoever:
+ + + +
omniORB: (?) 2013-06-26 11:04:48.720000: Invoke 'Start' on remote:
root<10588>
...
omniORB: (?) 2013-06-26 11:05:18.545000: Return 'Start' on remote:
root<10588>
+ + + +

Obviously, I'll retry with traceLevel 25 (once able to) to see if I can
glean anything from that. Maybe anyone can make something of what I
wrote so far ...

cheers,
Martin
Post by Martin Ba
Hi experts!
We had a communication failure recently on a direct LAN connection with
omniORB: (both sides use omniORB)
* One oneway callback worked without err, and immediately (~ms) after
this working callback, ->
* A oneway callback of ours failed with a COMM_FAILURE (unfortunately I
haven't logged the minor reason).
* The COMM_FAILURE happened 37 seconds after the call was invoked
* During these 37 secs, the other side did continuously make normal
synchronous calls and these all worked and were recorded in the logs.
Is it possible to make an educated guess as to what might be a possible
cause or at least where to look for problems?
thanks!
- Martin
_______________________________________________
omniORB-list mailing list
omniORB-list at omniorb-support.com
http://www.omniorb-support.com/mailman/listinfo/omniorb-list
Loading...