[omniORB] bug in bulk messages processing?

Discussion:

Serguei Kolos

2009-03-15 21:31:53 UTC

Hello

I'm using omniORB 4.1.3 on Linux kernel 2.6 with gcc34.

In my configuration I have a single server application serving many
concurrent multi-threading
clients which are sending requests to this server from several threads
concurrently. I set
oneCallPerConnection parameter to 0 on the clients. In this
configuration very often
I have some clients timing out while the server is not really under
heavy load. To investigate the
situation I have run the server and the clients with the traceLevel 35
and I have found a possible
indication that when a bulk of messages arrives over the same connection
to the server and the
server splits them into separate buffers then it looks like only first
message gets processed and
others get lost.
Here is how it appears in the server output:

omniORB: inputMessage: from giop:tcp:10.146.34.229:52782 2224 bytes
omniORB: 128 bytes out of 2224
4749 4f50 0100 0100 6c00 0000 0000 0000 GIOP....l.......
1200 0000 0100 3d20 1800 0000 ff69 7063 ......= .....ipc
2f70 6172 7469 7469 6f6e 0061 6c6c 5f68 /partition.all_h
6f73 7473 0800 0000 7265 736f 6c76 6500 osts....resolve.
0000 0000 0200 0000 0e00 0000 6973 2f72 ............is/r
6570 6f73 6974 6f72 7900 0000 0400 0000 epository.......
6970 6300 0800 0000 5275 6e43 7472 6c00 ipc.....RunCtrl.
0400 0000 6970 6300 4749 4f50 0100 0100 ....ipc.GIOP....
omniORB: Split input data into 19 messages
omniORB: sendChunk: to giop:tcp:10.146.34.229:52782 176 bytes
omniORB: 128 bytes out of 176
4749 4f50 0100 0101 a400 0000 0000 0000 GIOP............
1200 0000 0000 0000 1600 0000 4944 4c3a ............IDL:
6973 2f72 6570 6f73 6974 6f72 793a 312e is/repository:1.
3000 0000 0100 0000 0000 0000 7000 0000 0...........p...
0101 0200 0e00 0000 3130 2e31 3436 2e34 ........10.146.4
322e 3232 3800 cf81 1900 0000 ff69 732f 2.228........is/
7265 706c 6963 61fe 3c35 bc49 0100 2dda replica.<5.I..-.
0000 0000 0000 0000 0200 0000 0000 0000 ................

The output message above is the answer to the very first call in the
input requests sequence.
Then the next appearance of an IO on the giop:tcp:10.146.34.229:52782
endpoint
is happening much later and looks like a new request/response pattern:

inputMessage: from giop:tcp:10.146.34.229:52782 116 bytes
omniORB:
4749 4f50 0102 0100 6c00 0000 0a00 0000 GIOP....l.......
0300 0000 0000 4f54 1800 0000 ff69 7063 ......OT.....ipc
2f70 6172 7469 7469 6f6e 0061 6c6c 5f68 /partition.all_h
6f73 7473 0800 0000 7265 736f 6c76 6500 osts....resolve.
0000 0000 0100 0000 0200 0000 0e00 0000 ................
6973 2f72 6570 6f73 6974 6f72 7900 0000 is/repository...
0400 0000 6970 6300 0400 0000 504d 4700 ....ipc.....PMG.
0400 0000 6970 6300 ....ipc.

omniORB: sendChunk: to giop:tcp:10.146.34.229:52782 176 bytes
omniORB: 128 bytes out of 176
4749 4f50 0100 0101 a400 0000 0000 0000 GIOP............
1000 0000 0000 0000 1600 0000 4944 4c3a ............IDL:
6973 2f72 6570 6f73 6974 6f72 793a 312e is/repository:1.
3000 0000 0100 0000 0000 0000 7000 0000 0...........p...
0101 0200 0e00 0000 3130 2e31 3436 2e34 ........10.146.4
322e 3233 3100 908f 1900 0000 ff69 732f 2.231........is/
7265 706c 6963 61fe 6335 bc49 0100 276c replica.c5.I..'l
0000 0000 0000 0000 0200 0000 0000 0000 ................

Is it a bug somewhere in the omniORB or I'm missing something?

Cheers,
Sergei

Duncan Grisby

2009-03-17 04:30:55 UTC

Permalink

Post by Serguei Kolos
In my configuration I have a single server application serving many
concurrent multi-threading clients which are sending requests to this
server from several threads concurrently. I set oneCallPerConnection
parameter to 0 on the clients. In this configuration very often I have
some clients timing out while the server is not really under heavy
load. To investigate the situation I have run the server and the
clients with the traceLevel 35 and I have found a possible indication
that when a bulk of messages arrives over the same connection to the
server and the server splits them into separate buffers then it looks
like only first message gets processed and others get lost.

Well, the support for splitting messages into separate buffers
definitely works, with all calls being processed correctly, both for a
simple example I just tried and for a much more complex application I've
looked at traces for. So, the problem is either something specific to
the circumstances you're in, or something completely different.

It would be useful if you could send me a complete trace from
-ORBtraceLevel 30 -ORBtraceInvocations 1 -ORBtraceInvocationReturns 1
-ORBtraceThreadId 1 -ORBtraceTime 1 . That will give a much better idea
of what's going on.

Cheers,

Duncan.

--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --

Serguei Kolos

2009-03-17 20:45:43 UTC

Permalink

Hi Duncan

Many thanks for your prompt answer. I have noticed it in the archive
but for some reason does not get it in my mail box so I'm answering
to my original message.
For the moment I don't have access to the testbed for repeating the
tests with all the switches which you have indicated but meanwhile
I have put the full file which i got last time with the traceLevel set to 35
to my personal WEB page:
https://test-home.web.cern.ch/test-home/ipc-server.err.gz

I'll be able to repeat the tests with the other switches activated in a
week time and will post new results. Thank you in advance for looking
into it.

Cheers,
Sergei

Post by Serguei Kolos
Hello
I'm using omniORB 4.1.3 on Linux kernel 2.6 with gcc34.
In my configuration I have a single server application serving many
concurrent multi-threading
clients which are sending requests to this server from several threads
concurrently. I set
oneCallPerConnection parameter to 0 on the clients. In this
configuration very often
I have some clients timing out while the server is not really under
heavy load. To investigate the
situation I have run the server and the clients with the traceLevel 35
and I have found a possible
indication that when a bulk of messages arrives over the same
connection to the server and the
server splits them into separate buffers then it looks like only first
message gets processed and
others get lost.
omniORB: inputMessage: from giop:tcp:10.146.34.229:52782 2224 bytes
omniORB: 128 bytes out of 2224
4749 4f50 0100 0100 6c00 0000 0000 0000 GIOP....l.......
1200 0000 0100 3d20 1800 0000 ff69 7063 ......= .....ipc
2f70 6172 7469 7469 6f6e 0061 6c6c 5f68 /partition.all_h
6f73 7473 0800 0000 7265 736f 6c76 6500 osts....resolve.
0000 0000 0200 0000 0e00 0000 6973 2f72 ............is/r
6570 6f73 6974 6f72 7900 0000 0400 0000 epository.......
6970 6300 0800 0000 5275 6e43 7472 6c00 ipc.....RunCtrl.
0400 0000 6970 6300 4749 4f50 0100 0100 ....ipc.GIOP....
omniORB: Split input data into 19 messages
omniORB: sendChunk: to giop:tcp:10.146.34.229:52782 176 bytes
omniORB: 128 bytes out of 176
4749 4f50 0100 0101 a400 0000 0000 0000 GIOP............
6973 2f72 6570 6f73 6974 6f72 793a 312e is/repository:1.
3000 0000 0100 0000 0000 0000 7000 0000 0...........p...
0101 0200 0e00 0000 3130 2e31 3436 2e34 ........10.146.4
322e 3232 3800 cf81 1900 0000 ff69 732f 2.228........is/
7265 706c 6963 61fe 3c35 bc49 0100 2dda replica.<5.I..-.
0000 0000 0000 0000 0200 0000 0000 0000 ................
The output message above is the answer to the very first call in the
input requests sequence.
Then the next appearance of an IO on the giop:tcp:10.146.34.229:52782
endpoint
inputMessage: from giop:tcp:10.146.34.229:52782 116 bytes
4749 4f50 0102 0100 6c00 0000 0a00 0000 GIOP....l.......
0300 0000 0000 4f54 1800 0000 ff69 7063 ......OT.....ipc
2f70 6172 7469 7469 6f6e 0061 6c6c 5f68 /partition.all_h
6f73 7473 0800 0000 7265 736f 6c76 6500 osts....resolve.
0000 0000 0100 0000 0200 0000 0e00 0000 ................
6973 2f72 6570 6f73 6974 6f72 7900 0000 is/repository...
0400 0000 6970 6300 0400 0000 504d 4700 ....ipc.....PMG.
0400 0000 6970 6300 ....ipc.
omniORB: sendChunk: to giop:tcp:10.146.34.229:52782 176 bytes
omniORB: 128 bytes out of 176
4749 4f50 0100 0101 a400 0000 0000 0000 GIOP............
6973 2f72 6570 6f73 6974 6f72 793a 312e is/repository:1.
3000 0000 0100 0000 0000 0000 7000 0000 0...........p...
0101 0200 0e00 0000 3130 2e31 3436 2e34 ........10.146.4
322e 3233 3100 908f 1900 0000 ff69 732f 2.231........is/
7265 706c 6963 61fe 6335 bc49 0100 276c replica.c5.I..'l
0000 0000 0000 0000 0200 0000 0000 0000 ................
Is it a bug somewhere in the omniORB or I'm missing something?
Cheers,
Sergei