Discussion:
[omniORB] Remote method blocks
Masaaki Sekiya
2008-10-30 15:47:13 UTC
Permalink
Hi

I'm migrating from the omniORB 3.0.4 to 4.1.3 (patched).
It almost works fine but I noticed a significant difference.
If the corba server doesn't exit , client remote method invokation
blocks.
And while it's blocking, CPU consumption grows significantly.
With omniORB 3.0.4, remote method invokation threw COMM_FAILURE
exception.
Is it a right behavior in omniORB 4.1.3 ?

With the omniORB echo example, it also occurs.

In the condition that there is no omniNames in remote hostA
( If hostA is localhost , eg3_clt throws TRANSIENT exception )
eg3_clt -ORBInitRef NameService=corbaname::hostA:2809
==> BLOCKS

In the condition that there is no eg2_impl in remote hostA (eg2_impl
output IOR)
./eg2_clt IOR
==> BLOCKS

My environments is below.
UP-UX B.11.31
aCC complier 64bits

Thanx.
Masaaki Sekiya
2008-10-30 16:15:38 UTC
Permalink
I add a more information.

I tried with omniORB 4.1.2.
It does't blocks.
It throws COMM_FAILURE exception.
Is it only for a 4.1.3 behavior ?

Thanx.
Post by Masaaki Sekiya
Hi
I'm migrating from the omniORB 3.0.4 to 4.1.3 (patched).
It almost works fine but I noticed a significant difference.
If the corba server doesn't exit , client remote method invokation
blocks.
And while it's blocking, CPU consumption grows significantly.
With omniORB 3.0.4, remote method invokation threw COMM_FAILURE
exception.
Is it a right behavior in omniORB 4.1.3 ?
With the omniORB echo example, it also occurs.
In the condition that there is no omniNames in remote hostA
( If hostA is localhost , eg3_clt throws TRANSIENT exception )
eg3_clt -ORBInitRef NameService=corbaname::hostA:2809
==> BLOCKS
In the condition that there is no eg2_impl in remote hostA (eg2_impl
output IOR)
./eg2_clt IOR
==> BLOCKS
My environments is below.
UP-UX B.11.31
aCC complier 64bits
Thanx.
_______________________________________________
omniORB-list mailing list
http://www.omniorb-support.com/mailman/listinfo/omniorb-list
Duncan Grisby
2008-10-30 17:05:33 UTC
Permalink
Post by Masaaki Sekiya
I'm migrating from the omniORB 3.0.4 to 4.1.3 (patched).
It almost works fine but I noticed a significant difference.
If the corba server doesn't exit , client remote method invokation
blocks.
And while it's blocking, CPU consumption grows significantly.
With omniORB 3.0.4, remote method invokation threw COMM_FAILURE
exception.
Is it a right behavior in omniORB 4.1.3 ?
No, it's not what's meant to happen, and it doesn't happen for me.

Why you say 4.1.3 (patched), what patch?

What do you see if you run with -ORBtraceLevel 25 ?

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Masaaki Sekiya
2008-11-04 14:03:03 UTC
Permalink
Hi Duncan.
Post by Duncan Grisby
No, it's not what's meant to happen, and it doesn't happen for me.
Why you say 4.1.3 (patched), what patch?
What do you see if you run with -ORBtraceLevel 25 ?
I've patched socketcollection.patch , which was sent by Duncan on 24 Oct.

I run eg2clt with -ORBtraceLevel 25 in 3 cases as below.

1. ServerObject exists in the remote host (10.21.161.27) omniORB4.1.3
-> eg2clt works fine.
2. ServerObject doesn't exist in the remote host omniORB4.1.3
-> eg2clt blocks
3. ServerObject doesn't exist in the remote host omniORB4.1.2
-> eg2clt throws COMM_FAILURE exception

I noticed the difference 2. and 3.
In the log of 3, "AsyncInvoker: " is output before "Client attempt to
connect to",
in the log of 2, "AsyncInvoker: " is output after "Client attempt to
connect to".
Does it have something with blocking behavior ?

I attaches 3 logs and gdb trace while eg3clt is blocking.
1. is eg2clt_ok.4.1.3.log
2. is eg2clt_ng.4.1.3.log
3. is eg2clt_ng.4.1.2.log
gdb trace is gdb_trace.log

Thanx.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ok.4.1.3.log
Type: application/octet-stream
Size: 7492 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081104/ab44bb79/eg2clt_ok.4.1.3.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ng.4.1.3.log
Type: application/octet-stream
Size: 3507 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081104/ab44bb79/eg2clt_ng.4.1.3.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ng.4.1.2.log
Type: application/octet-stream
Size: 4831 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081104/ab44bb79/eg2clt_ng.4.1.2.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gdb_trace.log
Type: application/octet-stream
Size: 2743 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081104/ab44bb79/gdb_trace.obj
Duncan Grisby
2008-11-04 22:56:21 UTC
Permalink
Post by Masaaki Sekiya
I run eg2clt with -ORBtraceLevel 25 in 3 cases as below.
1. ServerObject exists in the remote host (10.21.161.27) omniORB4.1.3
-> eg2clt works fine.
2. ServerObject doesn't exist in the remote host omniORB4.1.3
-> eg2clt blocks
3. ServerObject doesn't exist in the remote host omniORB4.1.2
-> eg2clt throws COMM_FAILURE exception
Both the failure cases show something odd happening. The 4.1.3 case just
shows that the call is blocked in connect(). What exactly is going on at
the server end? Is there a server process running? Is the server
machine contactable?

The 4.1.2 case is very fishy. These lines show that the system is very
confused:

omniORB: Client attempt to connect to giop:tcp:10.21.161.27:50743
omniORB: Client opened connection to giop:tcp:255.255.255.255:65535

That suggests that the OS has done something bizarre when omniORB opened
a connection and asked for the peer address.
Post by Masaaki Sekiya
I noticed the difference 2. and 3.
In the log of 3, "AsyncInvoker: " is output before "Client attempt to
connect to",
in the log of 2, "AsyncInvoker: " is output after "Client attempt to
connect to".
Does it have something with blocking behavior ?
No, that's unrelated. That message comes from a separate thread, so it's
normal for it to appear at slightly different times.

You can show thread ids in your traces with -ORBtraceThreadId 1. I'd
suggest also using -ORBtraceTime 1 which shows times in log messages.

What trace output do you get from the server? What happens if you use
telnet to connect to the server's port (which you can see in the trace
messages or by using catior)? Does eg2 work when client and server are
on the same machine?

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Masaaki Sekiya
2008-11-05 15:11:27 UTC
Permalink
Hi Duncan , thanks for your comments.
Post by Duncan Grisby
Both the failure cases show something odd happening. The 4.1.3 case just
shows that the call is blocked in connect(). What exactly is going on at
the server end? Is there a server process running? Is the server
machine contactable?
There isn't a server process running and server machine is contactable.

I show you how to run this eg2 example step by step.
The hostA is server machine , the hostB is client machine.

1. In the hostA, I start the eg2_svr. eg2_svr outputs IOR as below.

% eng/eg2svr -ORBendPointPublish giop:tcp:10.21.161.27:
IOR:000000000000000d49444c3a4563686f3a312e3000000000000000010000000
00000064000102000000000d31302e32312e3136312e32370000d6d70000000efe4
910edbf044700000000000000000000000002000000000000000800000000415454
00000000010000001c0000000000010001000000010501000100010109000000010
0010109

The hostA has multi network interfaces as below.

% ifconfig lan0
lan0: flags=1843<UP,BROADCAST,RUNNING,MULTICAST,CKO>
inet 10.21.161.27 netmask ffffff00 broadcast 10.21.161.255
% ifconfig lan1
lan1: flags=1842<BROADCAST,RUNNING,MULTICAST,CKO>
inet 10.21.162.27 netmask ffffff00 broadcast 10.21.162.255

The lan1 is directly connected to another host (this means no router,
bridge .. etc) so make endPoint fix to lan0 with -ORBendPointPublish .

---------------------------------------------------------------------
2. In the hostB, I start the eg2_clt as below.

% eng/eg2clt IOR:..(this is output of eg2svr)
-> It works fine. (I said, "Hello!".The Echo object replied, "Hello!
"....)

----------------------------------------------------------------------
3. In the hostA, I stop the eg2_svr.

----------------------------------------------------------------------
4. In the hostB, I start the eg2_clt again.

% eng/eg2clt IOR:.. -ORBtraceLevel 25 -ORBtraceThreadId 1 -ORBtraceTime
1
-> It blocks.



The information of IOR is below, which is output by catior.

Type ID: "IDL:Echo:1.0"
Profiles:
1. IIOP 1.2 10.21.161.27 54999 POA(root) 0x00000000 (4 bytes)
TAG_ORB_TYPE omniORB
TAG_CODE_SETS char native code set: ISO-8859-1
char conversion code set: UTF-8
wchar native code set: UTF-16
wchar conversion code set: UTF-16
Post by Duncan Grisby
You can show thread ids in your traces with -ORBtraceThreadId 1. I'd
suggest also using -ORBtraceTime 1 which shows times in log messages.
I attach client log of 2. as eg2clt_ok.4.1.3.log, and
client log of 4. as eg2clt_ng.4.1.3.log.
Post by Duncan Grisby
What trace output do you get from the server? What happens if you use
telnet to connect to the server's port (which you can see in the trace
messages or by using catior)?
There isn't a server process running, while eg2clt blocks.

On the condition of 2. I can connect to by telnet
and on the condition of 4, I can't connect to as below.

% telnet 10.21.161.27 54999
Trying...
telnet: Unable to connect to remote host: Connection refused
Post by Duncan Grisby
Does eg2 work when client and server are on the same machine?
eg2clt throws TRANSIENT exception when client and server are on the same
machine. ( when there isn't a server process running )

I attach client log as eg2clt_ng.4.1.3_samehost.log.

-------
Masaaki Sekiya
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ok.4.1.3.log
Type: application/octet-stream
Size: 9988 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081105/5fd7bfb7/eg2clt_ok.4.1.3-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ng.4.1.3.log
Type: application/octet-stream
Size: 3987 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081105/5fd7bfb7/eg2clt_ng.4.1.3-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eg2clt_ng_4.1.3_samehost.log
Type: application/octet-stream
Size: 5714 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20081105/5fd7bfb7/eg2clt_ng_4.1.3_samehost-0001.obj
Masaaki Sekiya
2008-11-10 11:56:50 UTC
Permalink
Hi.

I digged omniORB code by using gdb.
Blocking occurs in tcpAddress::Connect and blocking scenario is as below.

USE_NONBLOCKING_CONNECT is defined, so program goes to
poll do-while loop (tcpAddress.cc:229-301) , and runs below line.

int rc = poll(&fds,1,timeout);
if (rc > 0 && fds.revents & POLLERR) {
rc = 0;
}

As the result of calling poll, fds.revents set to 0x8 (= POLLERR),
so rc is set to 0.
and when rc is 0, program continues to do-while loop as below.

if (rc == 0) {
// Time out!
#if defined(USE_FAKE_INTERRUPTABLE_RECV)
continue;

This just makes a huge CPU comsuption.

Now, I have a question.
Why is rc set to 0 , when poll results in POLLERR ?
Is this behaviour on only HP-UX11iV3 ?

Thanks.

Loading...