Discussion:
[omniORB] TCP configuration?
Martin Kocian
2011-07-20 23:20:52 UTC
Permalink
Hi,

I am experiencing delays in the TCP communication in omniorb 4.1.4 where
212 bytes get sent to a Linux box (kernel 2.6.18-92.el5) immediately
followed by about 200 kB. The problem is that the Linux box does not send
the ack for the 212 bytes until 40 ms later(see time in the second line):

14:39:12.071935 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.59711: P 399:611(212) ack 73 win 17376 <nop,nop,timestamp 83 100339560>
14:39:12.112040 IP rdcds105.lab1.reg.59711 > rce12.lab1.reg.1027: . ack 611 win 157 <nop,nop,timestamp 100378649 83>
^^
14:39:12.112235 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.59711: . 611:2059(1448) ack 73 win 17376 <nop,nop,timestamp 83 100378649>
14:39:12.112276 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.59711: . 2059:3507(1448) ack 73 win 17376 <nop,nop,timestamp 83 100378649>
14:39:12.112278 IP rdcds105.lab1.reg.59711 > rce12.lab1.reg.1027: . ack 2059 win 180 <nop,nop,timestamp 100378649 83>
14:39:12.112295 IP rdcds105.lab1.reg.59711 > rce12.lab1.reg.1027: . ack 3507 win 202 <nop,nop,timestamp 100378649 83>
...

I wrote a small, simple client-host program that emulates this situation
but there the Linx box sends the ack instantly:

18:37:52.427547 IP rce12.lab1.reg.1024 > rdcds105.lab1.reg.opencm: P 1:213(212) ack 1 win 17520
18:37:52.427580 IP rdcds105.lab1.reg.opencm > rce12.lab1.reg.1024: . ack 213 win 54
^^^^
18:37:52.427844 IP rce12.lab1.reg.1024 > rdcds105.lab1.reg.opencm: . 213:1673(1460) ack 1 win 17520
18:37:52.427858 IP rdcds105.lab1.reg.opencm > rce12.lab1.reg.1024: . ack 1673 win 77
...

Does omniorb configure TCP in some way that makes the receiving host wait
for multiple packets before it sends an ack? The only thing I could find
in the code is TCP_NODELAY being set but that does not make any difference
for this problem.

Any advice would be appreciated!

Thank you,

Martin


| Martin Kocian |
| ***@slac.stanford.edu |
| Stanford Linear Accelerator Center |
| M.S. 98, P.O. Box 20450 |
| Stanford, CA 94309 |
| Tel. (650)926-2887 Fax (650)926-2923 |
Duncan Grisby
2011-08-05 18:19:42 UTC
Permalink
Post by Martin Kocian
I am experiencing delays in the TCP communication in omniorb 4.1.4 where
212 bytes get sent to a Linux box (kernel 2.6.18-92.el5) immediately
followed by about 200 kB. The problem is that the Linux box does not send
What does omniORB think is happening at this time? Can you get a trace
from both client and server with these options:

-ORBtraceLevel 25 -ORBtraceInvocations 1 -ORBtraceTime 1
-ORBtraceThreadId 1

And get another tcpdump. Then we can correlate them all.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Martin Kocian
2011-08-09 04:36:30 UTC
Permalink
Hi Duncan,
Post by Duncan Grisby
What does omniORB think is happening at this time? Can you get a trace
Here is the output with the additional options turned on. Tcpdump is
running on the receiving Linux box so the time stamps between tcpdump and
the omniorb output on that machine are consistent.

Sender (rce12):

omniORB: (3) 1988-01-01 00:00:14.910000: Invoke 'update' on remote: root/is/replica<0>
omniORB: (3) 1988-01-01 00:00:14.910000: sendChunk: to giop:tcp:172.21.6.45:44449 212 bytes
omniORB: (3) 1988-01-01 00:00:14.910000: sendCopyChunk: to giop:tcp:172.21.6.45:44449 249620 bytes
omniORB: (3) 1988-01-01 00:00:14.960000: sendChunk: to giop:tcp:172.21.6.45:44449 100 bytes
omniORB: (3) 1988-01-01 00:00:14.960000: inputMessage: from giop:tcp:172.21.6.45:44449 24 bytes

Receiver (rdcds105):

omniORB: (4) 2011-08-08 15:20:34.809634: AsyncInvoker: thread id = 4 has started. Total threads = 3
omniORB: (4) 2011-08-08 15:20:34.809682: giopWorker task execute.
omniORB: (4) 2011-08-08 15:20:34.809718: inputMessage: from giop:tcp:[::ffff:172.21.6.24]:1027 212 bytes
omniORB: (4) 2011-08-08 15:20:34.850187: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850232: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850420: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 2896 bytes
omniORB: (4) 2011-08-08 15:20:34.850457: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850515: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 740 bytes
omniORB: (4) 2011-08-08 15:20:34.850560: Dispatching remote call 'update' to: root/is/replica<0> (active)
omniORB: (4) 2011-08-08 15:20:34.850645: inputCopyChunk: from giop:tcp:[::ffff:172.21.6.24]:1027 241640 bytes
omniORB: (4) 2011-08-08 15:20:34.864043: inputMessage: from giop:tcp:[::ffff:172.21.6.24]:1027 100 bytes
omniORB: (4) 2011-08-08 15:20:34.865406: sendChunk: to giop:tcp:[::ffff:172.21.6.24]:1027 24 bytes


tcpdump:

15:20:34.809553 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 399:611(212) ack 73 win 17376 <nop,nop,timestamp 27 3545665898>
15:20:34.849970 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 611 win 62 <nop,nop,timestamp 3545676958 27>
15:20:34.850137 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: . 611:2059(1448) ack 73 win 17376 <nop,nop,timestamp 27 3545676958>
15:20:34.850164 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 2059 win 85 <nop,nop,timestamp 3545676958 27>
15:20:34.850191 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: . 2059:3507(1448) ack 73 win 17376 <nop,nop,timestamp 27 3545676958>
15:20:34.850198 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 3507 win 108 <nop,nop,timestamp 3545676959 27>

...

15:20:34.864015 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250231:250327(96) ack 73 win 17376 <nop,nop,timestamp 27 3545676972>
15:20:34.864021 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250327:250331(4) ack 73 win 17376 <nop,nop,timestamp 27 3545676972>
15:20:34.864027 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 250327 win 505 <nop,nop,timestamp 3545676972 27>
15:20:34.864032 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 250331 win 505 <nop,nop,timestamp 3545676972 27>
15:20:34.865436 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: P 73:97(24) ack 250331 win 505 <nop,nop,timestamp 3545676974 27>
15:20:34.867110 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250331:251079(748) ack 97 win 17376 <nop,nop,timestamp 27 3545676974>


Thanks a lot,

Martin
Post by Duncan Grisby
Post by Martin Kocian
I am experiencing delays in the TCP communication in omniorb 4.1.4 where
212 bytes get sent to a Linux box (kernel 2.6.18-92.el5) immediately
followed by about 200 kB. The problem is that the Linux box does not send
What does omniORB think is happening at this time? Can you get a trace
-ORBtraceLevel 25 -ORBtraceInvocations 1 -ORBtraceTime 1
-ORBtraceThreadId 1
And get another tcpdump. Then we can correlate them all.
Cheers,
Duncan.
--
-- Duncan Grisby --
-- http://www.grisby.org --
| Martin Kocian |
| ***@slac.stanford.edu |
| Stanford Linear Accelerator Center |
| M.S. 95, P.O. Box 20450 |
| Stanford, CA 94309 |
| Tel. (650)926-2887 Fax (650)926-2923 |
Duncan Grisby
2011-09-08 21:59:22 UTC
Permalink
Hi Martin,

Sorry for taking ages to reply. Those traces show that the server is
quite correct in delaying the ACK because it's still waiting for more
data. The real question is why the client is waiting for the ACK before
sending the next chunk of data. It should keep sending regardless.

Is it the first call on the connection? If so, does the same delay
happen on subsequent calls?

What happens if you set the maxSocketSend configuration parameter to
8192 on the client? That will prevent it sending the 249620 bytes in
one single send() call.

How about if you set giopStream::directSendCutOff to a number larger
than your buffer? You'll need to #include
<omniORB4/internal/giopStream.h> to access it.

Cheers,

Duncan.
Post by Martin Kocian
Hi Duncan,
Post by Duncan Grisby
What does omniORB think is happening at this time? Can you get a trace
Here is the output with the additional options turned on. Tcpdump is
running on the receiving Linux box so the time stamps between tcpdump and
the omniorb output on that machine are consistent.
omniORB: (3) 1988-01-01 00:00:14.910000: Invoke 'update' on remote: root/is/replica<0>
omniORB: (3) 1988-01-01 00:00:14.910000: sendChunk: to giop:tcp:172.21.6.45:44449 212 bytes
omniORB: (3) 1988-01-01 00:00:14.910000: sendCopyChunk: to giop:tcp:172.21.6.45:44449 249620 bytes
omniORB: (3) 1988-01-01 00:00:14.960000: sendChunk: to giop:tcp:172.21.6.45:44449 100 bytes
omniORB: (3) 1988-01-01 00:00:14.960000: inputMessage: from giop:tcp:172.21.6.45:44449 24 bytes
omniORB: (4) 2011-08-08 15:20:34.809634: AsyncInvoker: thread id = 4 has started. Total threads = 3
omniORB: (4) 2011-08-08 15:20:34.809682: giopWorker task execute.
omniORB: (4) 2011-08-08 15:20:34.809718: inputMessage: from giop:tcp:[::ffff:172.21.6.24]:1027 212 bytes
omniORB: (4) 2011-08-08 15:20:34.850187: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850232: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850420: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 2896 bytes
omniORB: (4) 2011-08-08 15:20:34.850457: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 1448 bytes
omniORB: (4) 2011-08-08 15:20:34.850515: inputMessage: (body) from giop:tcp:[::ffff:172.21.6.24]:1027 740 bytes
omniORB: (4) 2011-08-08 15:20:34.850560: Dispatching remote call 'update' to: root/is/replica<0> (active)
omniORB: (4) 2011-08-08 15:20:34.850645: inputCopyChunk: from giop:tcp:[::ffff:172.21.6.24]:1027 241640 bytes
omniORB: (4) 2011-08-08 15:20:34.864043: inputMessage: from giop:tcp:[::ffff:172.21.6.24]:1027 100 bytes
omniORB: (4) 2011-08-08 15:20:34.865406: sendChunk: to giop:tcp:[::ffff:172.21.6.24]:1027 24 bytes
15:20:34.809553 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 399:611(212) ack 73 win 17376 <nop,nop,timestamp 27 3545665898>
15:20:34.849970 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 611 win 62 <nop,nop,timestamp 3545676958 27>
15:20:34.850137 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: . 611:2059(1448) ack 73 win 17376 <nop,nop,timestamp 27 3545676958>
15:20:34.850164 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 2059 win 85 <nop,nop,timestamp 3545676958 27>
15:20:34.850191 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: . 2059:3507(1448) ack 73 win 17376 <nop,nop,timestamp 27 3545676958>
15:20:34.850198 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 3507 win 108 <nop,nop,timestamp 3545676959 27>
...
15:20:34.864015 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250231:250327(96) ack 73 win 17376 <nop,nop,timestamp 27 3545676972>
15:20:34.864021 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250327:250331(4) ack 73 win 17376 <nop,nop,timestamp 27 3545676972>
15:20:34.864027 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 250327 win 505 <nop,nop,timestamp 3545676972 27>
15:20:34.864032 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: . ack 250331 win 505 <nop,nop,timestamp 3545676972 27>
15:20:34.865436 IP rdcds105.lab1.reg.44449 > rce12.lab1.reg.1027: P 73:97(24) ack 250331 win 505 <nop,nop,timestamp 3545676974 27>
15:20:34.867110 IP rce12.lab1.reg.1027 > rdcds105.lab1.reg.44449: P 250331:251079(748) ack 97 win 17376 <nop,nop,timestamp 27 3545676974>
Thanks a lot,
Martin
Post by Duncan Grisby
Post by Martin Kocian
I am experiencing delays in the TCP communication in omniorb 4.1.4 where
212 bytes get sent to a Linux box (kernel 2.6.18-92.el5) immediately
followed by about 200 kB. The problem is that the Linux box does not send
What does omniORB think is happening at this time? Can you get a trace
-ORBtraceLevel 25 -ORBtraceInvocations 1 -ORBtraceTime 1
-ORBtraceThreadId 1
And get another tcpdump. Then we can correlate them all.
Cheers,
Duncan.
--
-- Duncan Grisby --
-- http://www.grisby.org --
| Martin Kocian |
| Stanford Linear Accelerator Center |
| M.S. 95, P.O. Box 20450 |
| Stanford, CA 94309 |
| Tel. (650)926-2887 Fax (650)926-2923 |
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Loading...