Discussion:
[omniORB] RPC calls hanging
Preston, Ralph A.
2008-10-31 23:44:37 UTC
Permalink
Hello,

Using omniORB 4.1.3 on Windows XP I've been having calls hang for
outConScanPeriod. Once this time expires the call completes correctly.
I've captured three traces at traceLevel 40 of the error. One of them
hung on a call to _narrow (_is_a) to omniNames, a snippet of the trace
is below.

Reducing the outConScanPeriod to limits the amount of time wasted but
it is still a major problem. We have a number of services running (~5)
and at least one of them hangs every time we run a test.

Looking at a packet capture, a new TCP connection is established (SYN,
SYNACK, ACK) then the RPC message is sent and ACK'd but the server
doesn't return any data until a minute-ish later. During a correctly
behaved sequence the server returns the data immediately (doesn't send
the ACK, pause a minute, then send the data.)

Following are two trace files. The first trace is of my service
performing a narrow on an object reference returned by omniNames. The
service sends the _is_a command, is idle for a long time, then receives
the response from omniNames. The second trace is from omniNames and
shows omniNames being idle for a long time, having a comm_failure then
receiving the _is_a call and responding to it.

Any suggestions are appreciated, thanks in advance.

My Service calling _narrow() and waiting a long time for the response:

omniORB: (0) Creating ref to remote: key<NameService>
target id : IDL:omg.org/CORBA/Object:1.0
most derived id:
omniORB: (0) Initial reference `NameService' resolved from
configuration file.
omniORB: (0) Invoke '_is_a' on remote: key<NameService>
omniORB: (0) Client attempt to connect to giop:tcp:192.168.200.8:2809
omniORB: (2) AsyncInvoker: thread id = 2 has started. Total threads = 2
omniORB: (2) Scavenger task execute.
omniORB: (0) Client opened connection to giop:tcp:192.168.200.8:2809
omniORB: (0) sendChunk: to giop:tcp:192.168.200.8:2809 100 bytes
omniORB: (0)
4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
0200 0000 0100 0000 0b00 0000 4e61 6d65 ............Name
5365 7276 6963 6500 0600 0000 5f69 735f Service....._is_
6100 0000 0000 0000 2800 0000 4944 4c3a a.......(...IDL:
6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
312e 3000 1.0.
omniORB: (1) SocketCollection idle. Sleeping.
omniORB: (2) Scan for idle connections (1224879646,211125000)
omniORB: (2) Scan for idle connections done (1224879646,211125000).
omniORB: (2) Scan for idle connections (1224879651,211125000)
omniORB: (2) Scan for idle connections done (1224879651,211125000).
...
[snip 19 "Scan for idle connections done" lines]
...
omniORB: (2) Scan for idle connections (1224879751,211125000)
omniORB: (2) Scan for idle connections done (1224879751,211125000).
omniORB: (2) Scan for idle connections (1224879756,211125000)
omniORB: (2) Scan for idle connections done (1224879756,211125000).
omniORB: (0) inputMessage: from giop:tcp:192.168.200.8:2809 25 bytes
omniORB: (0)
4749 4f50 0100 0101 0d00 0000 0000 0000 GIOP............
0200 0000 0000 0000 01 .........
omniORB: (0) Creating ref to remote: key<NameService>
target id : IDL:omg.org/CosNaming/NamingContext:1.0
most derived id:
Logger _GetObjectReference resolve LogService name
omniORB: (0) Invoke 'resolve' on remote: key<NameService>
omniORB: (0) sendChunk: to giop:tcp:192.168.200.8:2809 101 bytes
omniORB: (0)
4749 4f50 0100 0100 5900 0000 0000 0000 GIOP....Y.......
0400 0000 0100 0000 0b00 0000 4e61 6d65 ............Name
5365 7276 6963 6500 0800 0000 7265 736f Service.....reso
6c76 6500 0000 0000 0200 0000 0800 0000 lve.............
5465 7374 6265 6400 0100 0000 006d 696e Testbed......min
0b00 0000 4c6f 6753 6572 7669 6365 003a ....LogService.:
0100 0000 00 .....


omniNames view:


omniORB: (6) Dispatching remote call 'bind' to:
root/<112e024901000eb8/3> (active)
omniORB: (6) Creating ref to remote: root<0>
target id : IDL:omg.org/CORBA/Object:1.0
most derived id: IDL:acme.com/CD/ClockController:1.0
omniORB: (6) createLocalObjRef -- reusing reference from local ref
list.
omniORB: (6) sendChunk: to giop:tcp:192.168.200.8:3428 24 bytes
omniORB: (6)
4749 4f50 0102 0101 0c00 0000 1000 0000 GIOP............
0000 0000 0000 0000 ........
omniORB: (2) Scan for idle connections (1224879640,849625000)
omniORB: (2) Scavenger reduce idle count for strand 0x10188298 to 35
omniORB: (2) Scavenger reduce idle count for strand 0x1018d3a8 to 35
omniORB: (2) Scavenger reduce idle count for strand 0x1018ffb0 to 35
omniORB: (2) Scavenger reduce idle count for strand 0x10190678 to 35
omniORB: (2) Scan for idle connections done (1224879640,849625000).
...
[snip from 34 to 14]
...
omniORB: (2) Scan for idle connections (1224879750,849625000)
omniORB: (2) Scavenger reduce idle count for strand 0x10188298 to 13
omniORB: (2) Scavenger reduce idle count for strand 0x1018d3a8 to 13
omniORB: (2) Scavenger reduce idle count for strand 0x1018ffb0 to 13
omniORB: (2) Scavenger reduce idle count for strand 0x10190678 to 13
omniORB: (2) Scan for idle connections done (1224879750,849625000).
omniORB: (3) Error in network receive (start of message):
giop:tcp:192.168.200.8:3421
omniORB: (3) throw giopStream::CommFailure from
giopStream.cc:874(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: (3) Server connection refcount = 1
omniORB: (3) Server connection refcount = 0
omniORB: (3) Server close connection from giop:tcp:192.168.200.8:3421
omniORB: (4) inputMessage: from giop:tcp:192.168.200.8:3423 12 bytes
omniORB: (4)
4749 4f50 0102 0105 0000 0000 GIOP........
omniORB: (4) Orderly connection shutdown: giop:tcp:192.168.200.8:3423
omniORB: (4) throw giopStream::CommFailure from
giopImpl12.cc:1374(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: (4) Server connection refcount = 1
omniORB: (4) Server connection refcount = 0
omniORB: (4) Server close connection from giop:tcp:192.168.200.8:3423
omniORB: (2) Scan for idle connections (1224879755,849625000)
omniORB: (2) Scavenger reduce idle count for strand 0x1018ffb0 to 12
omniORB: (2) Scavenger reduce idle count for strand 0x10190678 to 12
omniORB: (2) Scan for idle connections done (1224879755,849625000).
omniORB: (5) Error in network receive (start of message):
giop:tcp:192.168.200.8:3426
omniORB: (5) throw giopStream::CommFailure from
giopStream.cc:874(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: (5) Server connection refcount = 1
omniORB: (5) Server connection refcount = 0
omniORB: (5) Server close connection from giop:tcp:192.168.200.8:3426
omniORB: (6) inputMessage: from giop:tcp:192.168.200.8:3428 12 bytes
omniORB: (6)
4749 4f50 0102 0105 0000 0000 GIOP........
omniORB: (6) Orderly connection shutdown: giop:tcp:192.168.200.8:3428
omniORB: (6) throw giopStream::CommFailure from
giopImpl12.cc:1374(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: (6) Server connection refcount = 1
omniORB: (6) Server connection refcount = 0
omniORB: (6) Server close connection from giop:tcp:192.168.200.8:3428
omniORB: (1) Server accepted connection from
giop:tcp:192.168.200.8:3432
omniORB: (6) giopWorker task execute.
omniORB: (6) Accepted connection from giop:tcp:192.168.200.8:3432
because of this rule: "* unix,ssl,tcp"
omniORB: (6) inputMessage: from giop:tcp:192.168.200.8:3432 100 bytes
omniORB: (6)
4749 4f50 0100 0100 5800 0000 0000 0000 GIOP....X.......
0200 0000 0100 0000 0b00 0000 4e61 6d65 ............Name
5365 7276 6963 6500 0600 0000 5f69 735f Service....._is_
6100 0000 0000 0000 2800 0000 4944 4c3a a.......(...IDL:
6f6d 672e 6f72 672f 436f 734e 616d 696e omg.org/CosNamin
672f 4e61 6d69 6e67 436f 6e74 6578 743a g/NamingContext:
312e 3000 1.0.
omniORB: (6) Dispatching remote call '_is_a' to: key<NameService>
(active)
omniORB: (6) sendChunk: to giop:tcp:192.168.200.8:3432 25 bytes
omniORB: (6)
4749 4f50 0100 0101 0d00 0000 0000 0000 GIOP............
0200 0000 0000 0000 01 .........
omniORB: (6) inputMessage: from giop:tcp:192.168.200.8:3432 101 bytes
omniORB: (6)
4749 4f50 0100 0100 5900 0000 0000 0000 GIOP....Y.......
0400 0000 0100 0000 0b00 0000 4e61 6d65 ............Name
5365 7276 6963 6500 0800 0000 7265 736f Service.....reso
6c76 6500 0000 0000 0200 0000 0800 0000 lve.............
5465 7374 6265 6400 0100 0000 006d 696e Testbed......min
0b00 0000 4c6f 6753 6572 7669 6365 003a ....LogService.:
0100 0000 00 .....

Thanks,

Ralph A. Preston
The MITRE Corporation
202 Burlington Road, MS E095
Bedford, MA 01730
Office: 781-271-7914 Fax: 781-271-8915 Cell: 617-335-2226

Loading...