Discussion:
[omniORB] giopRendezvouser exit on error
Vinouse, Jean-Pierre (Jean-Pierre)
2007-11-21 19:33:40 UTC
Permalink
Duncan, All,

our server application runs omniOrb 4.0.7 under VxWorks. Client
(solaris, different ORB) connects to server through an IP over ATM
network.

We encounter a very sporadic field issue on the server side which leads
to the termination (exit on error) of the task giopRendezvouser, which
listens at client connection requests on the incoming Orb port.

Following message (raised by method giopServer::notifyRzDone) could be
logged :

t1 : Unrecoverable error for this endpoint:
giop:tcp:192.168.171.94:15678, it will
no longer be serviced.

The issue occurs in conjunction with major ATM network outages and/or a
stop restart scenario of the client.

Did anybody encounter similar problems and may provide some help ?

Regards

Jean-Pierre


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20071121/1d7e9d40/attachment.htm
Duncan Grisby
2007-12-03 23:26:43 UTC
Permalink
Post by Vinouse, Jean-Pierre (Jean-Pierre)
our server application runs omniOrb 4.0.7 under VxWorks. Client
(solaris, different ORB) connects to server through an IP over ATM
network.
We encounter a very sporadic field issue on the server side which leads
to the termination (exit on error) of the task giopRendezvouser, which
listens at client connection requests on the incoming Orb port.
Following message (raised by method giopServer::notifyRzDone) could be
giop:tcp:192.168.171.94:15678, it will
no longer be serviced.
Are you able to update to the latest snapshot on either the
omni4_0_develop or (better) omni4_1_develop branch? Various problems to
do with connection management have been fixed since 4.0.7, so it would
be good for you to use the latest version. If you can reproduce the
error with the latest release, the first thing to do is to get a trace
from the server with traceLevel 25 traceThreadId 1.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Vinouse, Jean-Pierre (Jean-Pierre)
2008-02-21 18:15:23 UTC
Permalink
Duncan,

an issue with omniOrb 4.0.7 which I reported end of last year
http://www.omniorb-support.com/pipermail/omniorb-list/2007-December/0290
59.html

appears now more frequently since a different client is connected to our
server application.

The good news: we managed to record omniOrb traces at level 30 (see
short file attached).

The client behaves in a different way as the previous one, it does not
use the GIOP 1.2 message close connection to release the connection.
Instead as shown by an ethereal trace the TCP connection is released
"hardly" at TCP level (FIN packet) a few msecs after the GIOP reply has
been sent out. I presume this cause the giopStream to throw the
exception COMM_FAILURE_UnMarshalArguments. And it seems that in some
circumstances (large message sent here 66kbytes, network perhaps
degraded or slow) the SocketCollection::select() may return an error of
type "invalid file descriptor". Following the giopRrendezvouser
terminates and no further incoming connection can be accepted. Our
application then triggers a reboot.

Is it an known issue ? Would a fix exists for 4.0.7 version, we could
test it in our env.

Regards

Jean-Pierre

Alcatel-Lucent
UMTS dev
Nuremberg
Germany

-------------- next part --------------
UMC1 [01-01 01:57:34Z] 0x1efeb94 omniT1 SW=06000398 3342 ?I OAM_CORBA_AGENT omniORB: Server accepted connection from giop:tcp:135.120.154.182:47432



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3343 ?I OAM_CORBA_AGENT omniORB: giopWorker task execute.



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3344 ?I OAM_CORBA_AGENT omniORB: Accepted connection from giop:tcp:135.120.154.182:47432 because of this rule: "* unix,ssl,tcp"



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3345 ?I OAM_CORBA_AGENT omniORB: inputMessage: from giop:tcp:135.120.154.182:47432 159 bytes



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3346 ?I OAM_CORBA_AGENT omniORB: 128 bytes out of 159

4749 4f50 0102 0000 0000 0093 0000 0000 GIOP............

0300 0000 0000 0000 0000 000e fe00 0000

UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3347 ?I OAM_CORBA_AGENT omniORB: Receive codeset service context and set TCS to (ISO-8859-1,UTF-16)



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3348 ?I OAM_CORBA_AGENT omniORB: Dispatching remote call 'getAttributes' to: root<3> (active)



UMC1 [01-01 01:57:34Z] 0x1efeb94 t33 SW=06000398 3349 ?I OAM_CORBA_AGENT Incoming ItfB request from <OMCU=1>: <getAttributes> for MO <OneBTSEquipment=2>



..... cut ......Application handles the request.....


UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3579 ?I OAM_CORBA_AGENT ItfB request <getAttributes> for MO <OneBTSEquipment=2> finished successfully

UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3580 ?I OAM_CORBA_AGENT omniORB: sendChunk: to giop:tcp:135.120.154.182:47432 28 bytes



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3581 ?I OAM_CORBA_AGENT omniORB:

4749 4f50 0102 0201 0001 020c 0000 0000 GIOP............

0000 0000 0000 0000 0001 0202 ............



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3582 ?I OAM_CORBA_AGENT omniORB: sendCopyChunk: to giop:tcp:135.120.154.182:47432 66044 bytes



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3583 ?I OAM_CORBA_AGENT omniORB: 128 bytes out of 66044

3c4d 4f49 4c69 7374 3e3c 4d4f 4920 6664 <MOIList><MOI fd

6e3d 224f 6e65 4254 5345 7175 6970 6d6

UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3584 ?I OAM_CORBA_AGENT omniORB: sendChunk: to giop:tcp:135.120.154.182:47432 22 bytes



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3585 ?I OAM_CORBA_AGENT omniORB:

4749 4f50 0102 0007 0000 000a 0000 0000 GIOP............

4c69 7374 3e00 List>.



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3586 ?I OAM_CORBA_AGENT omniORB: throw giopStream::CommFailure from giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3587 ?I OAM_CORBA_AGENT omniORB: Server connection refcount = 1



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3588 ?I OAM_CORBA_AGENT omniORB: Server connection refcount = 0



UMC1 [01-01 01:57:35Z] 0x1efeb94 t33 SW=06000398 3589 ?I OAM_CORBA_AGENT omniORB: Server close connection from giop:tcp:135.120.154.182:47432



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3590 ?I OAM_CORBA_AGENT omniORB: select() returned socket error ERRNO=851971



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3591 ?I OAM_CORBA_AGENT omniORB: giopRendezvouser for endpoint giop:tcp:135.120.176.153:15678 exit.



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3592 ?I OAM_CORBA_AGENT omniORB: Unrecoverable error for this endpoint: giop:tcp:135.120.176.153:15678, it will no longer be serviced.



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3593 ?I OAM_CORBA_AGENT omniORB: TCP endpoint shut down.



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3594 !E PANIC giopRendezvouser::execute exit on error ERRNO=851971



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3595 !E PANIC UBM F:/vobs/omniORB/build/src/src/lib/omniORB/orbcore/giopRendezvouser.cc L:128 ***

UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 38 ?P PANIC:giopRendezvouser::execute exit on error ERRNO=851971



UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 39 ?P PANIC:UBM F:/vobs/omniORB/build/src/src/lib/omniORB/orbcore/giopRendezvouser.cc L:128 ***

UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 40 ?P PANIC:********** UBM TRACE BACK **********

UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3596 !W UBM 0x01dc50dc 0x01dc6454 0x01ef4140 0x0276de3c 0x01e6b2a8

UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 SW=06000398 3597 !W UBM *************** UBM TRACE BACK start *********************************************************

UMC1 [01-01 01:57:35Z] 0x1efeb94 omniT1 41 ?P PANIC:0x01dc50e4 0x01dc6454 0x01ef4140 0x0276de3c 0x01e6b2a8

UMC1 [01-01 01:57:45Z] 0x1efeb94 t1 SW=06000398 3598 ?I OAM_CORBA_AGENT omniORB: Scan for idle connections (7065,30000000)



UMC1 [01-01 01:57:45Z] 0x1efeb94 t1 SW=06000398 3599 ?I OAM_CORBA_AGENT omniORB: Scavenger reduce idle count for strand 0x90fa830 to 28



UMC1 [01-01 01:57:45Z] 0x1efeb94 t1 SW=06000398 3600 ?I OAM_CORBA_AGENT omniORB: Scavenger reduce idle count for strand 0x90faf10 to 28



UMC1 [01-01 01:57:45Z] 0x1efeb94 t1 SW=06000398 3601 ?I OAM_CORBA_AGENT omniORB: Scan for idle connections done (7065,30000000).



UMC1 [01-01 01:57:45Z] 0x1efeb94 t33 SW=06000398 3602 ?I OAM_CORBA_AGENT omniORB: AsyncInvoker: thread id = 34 has exited. Total threads = 3





comUtil.c: 284: Request for reboot: giopRendezvouser::execute exit on error ERRNO=851971

(reason_id=128)
Duncan Grisby
2008-02-25 21:47:03 UTC
Permalink
Post by Vinouse, Jean-Pierre (Jean-Pierre)
The client behaves in a different way as the previous one, it does not
use the GIOP 1.2 message close connection to release the connection.
Instead as shown by an ethereal trace the TCP connection is released
"hardly" at TCP level (FIN packet) a few msecs after the GIOP reply has
been sent out. I presume this cause the giopStream to throw the
exception COMM_FAILURE_UnMarshalArguments. And it seems that in some
circumstances (large message sent here 66kbytes, network perhaps
degraded or slow) the SocketCollection::select() may return an error of
type "invalid file descriptor". Following the giopRrendezvouser
terminates and no further incoming connection can be accepted. Our
application then triggers a reboot.
Is it an known issue ? Would a fix exists for 4.0.7 version, we could
test it in our env.
Are you able to try omniORB 4.1.2? Failing that, please try the latest
snapshot on the omni4_0_develop branch:

http://omniorb.sourceforge.net/snapshots/omniORB-4.0-latest.tar.gz

I believe the bug you are seeing has already been fixed.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Loading...