[omniORB] CORBA::TRANSIENT and lag

Discussion:

Sylvain Gault

2013-08-19 19:34:51 UTC

Hello,

Sometime, when I call a method on a remote object, I get a
CORBA::TRANSIENT exception. I've been told that in this case, I should
just retry that call.
The traces tells me that these exceptions are due to connections that
fail to be established.
This introduce a lag of at least 21 seconds (between the first call and
the time I get the exception). Sometime 42 seconds when the second call
fail also. This is quite annoying.

Therefore, I have two questions:
What can this be due to?
My trafic shaping may drop some packets sometime, but this even should
be quite rare. Can this be related?
Can I, somehow, shorten the delay between the call start and the
exception throw?

Regards,

Sylvain Gault

Sylvain Gault

2013-08-19 23:57:54 UTC

Permalink

Post by Sylvain Gault
Hello,
Sometime, when I call a method on a remote object, I get a
CORBA::TRANSIENT exception. I've been told that in this case, I should
just retry that call.
The traces tells me that these exceptions are due to connections that
fail to be established.
This introduce a lag of at least 21 seconds (between the first call and
the time I get the exception). Sometime 42 seconds when the second call
fail also. This is quite annoying.
What can this be due to?
My trafic shaping may drop some packets sometime, but this even should
be quite rare. Can this be related?
Can I, somehow, shorten the delay between the call start and the
exception throw?

Additional information:
It may not be linked to omniORB directly.

When running my process within strace, I can see the following lines:

01:00:50.668058 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 132
01:00:50.668120 setsockopt(132, SOL_TCP, TCP_NODELAY, [1], 4) = 0
01:00:50.668195 fcntl(132, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
01:00:50.668262 connect(132, {sa_family=AF_INET, sin_port=htons(43193), sin_addr=inet_addr("10.132.0.10")}, 16) = -1 EINPROGRESS (Operation now in progress)
01:00:50.668534 poll([{fd=132, events=POLLOUT}], 1, -1) = 1 ([{fd=132, revents=POLLOUT|POLLERR|POLLHUP}])
01:01:11.669504 write(2, "omniORB: Failed to connect (no peer name): 10.132.0.10\n", 55) = 55
01:01:11.669605 close(132) = 0
01:01:11.669655 write(2, "omniORB: Switch rope to use address giop:tcp:10.132.0.10:43193\n", 63) = 63
01:01:11.669719 write(2, "omniORB: Unable to open new connection: giop:tcp:10.132.0.10:43193\n", 67) = 67
01:01:11.669774 write(2, "omniORB: throw giopStream::CommFailure from giopStream.cc:1153(0,NO,TRANSIENT_ConnectFailed)\n", 93) = 93

What I understand is that it tries to connect a non blocking socket, and
21 seconds later it fails.
However, the peer node do not see any incoming connection.
I guess the packets were just dropped because of the trafic shaping...

Although, any comment is welcome.

Regards,
Sylvain Gault

Duncan Grisby

2013-08-28 10:39:07 UTC

Permalink

Post by Sylvain Gault

Post by Sylvain Gault
Sometime, when I call a method on a remote object, I get a
CORBA::TRANSIENT exception. I've been told that in this case, I should
just retry that call.

Whether that's the right thing to do depends on what you're trying to
achieve...

[...]

Post by Sylvain Gault
It may not be linked to omniORB directly.
01:00:50.668058 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 132
01:00:50.668120 setsockopt(132, SOL_TCP, TCP_NODELAY, [1], 4) = 0
01:00:50.668195 fcntl(132, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
01:00:50.668262 connect(132, {sa_family=AF_INET, sin_port=htons(43193), sin_addr=inet_addr("10.132.0.10")}, 16) = -1 EINPROGRESS (Operation now in progress)
01:00:50.668534 poll([{fd=132, events=POLLOUT}], 1, -1) = 1 ([{fd=132, revents=POLLOUT|POLLERR|POLLHUP}])
01:01:11.669504 write(2, "omniORB: Failed to connect (no peer name): 10.132.0.10\n", 55) = 55

[...]

Post by Sylvain Gault
What I understand is that it tries to connect a non blocking socket, and
21 seconds later it fails.
However, the peer node do not see any incoming connection.
I guess the packets were just dropped because of the trafic shaping...

Yes, it looks like that's what's happening. The delay before the connect
fails is in the TCP stack, not in omniORB.

[...]

Post by Sylvain Gault
Can I, somehow, shorten the delay between the call start and the
exception throw?

Look at the omniORB::clientConnectTimeout() function /
clientConnectTimeOutPeriod configuration setting:

http://omniorb.sourceforge.net/omni41/omniORB/omniORB008.html#htoc95

Cheers,

Duncan.

--
-- Duncan Grisby --
-- duncan at grisby.org --
-- http://www.grisby.org --

Sylvain Gault

2013-09-05 14:26:19 UTC

Permalink

Post by Duncan Grisby

Post by Sylvain Gault

Post by Sylvain Gault
Sometime, when I call a method on a remote object, I get a
CORBA::TRANSIENT exception. I've been told that in this case, I should
just retry that call.

Whether that's the right thing to do depends on what you're trying to
achieve...
[...]

[...]

Yes, it looks like that's what's happening. The delay before the connect
fails is in the TCP stack, not in omniORB.
[...]

Post by Sylvain Gault
Can I, somehow, shorten the delay between the call start and the
exception throw?

Look at the omniORB::clientConnectTimeout() function /
http://omniorb.sourceforge.net/omni41/omniORB/omniORB008.html#htoc95

Actually, i've solved this by refining my trafic shaping / policing.
Just for information, the solution was to add an ingress policy on the
router to limit the bandwidth, so that the sender get notified earlier
about the congestion and adapt their sending rate.

Regards,

Sylvain