Nigel Rantor
2006-11-21 18:06:43 UTC
Hi all,
I've used omniORB before and have decided to use it on a new project I
am a part of. I've used the C++ bindings in the past, now I'm
experimenting with Python.
I joined the list last week and it seems fairly low-volume so I'm going
to post without having been around too long.
I think I have some idea of what the underlying problem may be but I'm
not sure, so here goes.
I want to start up one specific service in such a way that I do not need
a bootstrapping service to get hold of it remotely.
My current solution is to provide the ORB with an endpoint, use the
omniINSPOA to house my service and provide a well-known name to it so
that I can construct a corbaloc URL by knowing the hostname, port and name.
This all works fine. I have no problems doing this, it's groovy. I have
code that works.
My problem is when I try to test these services by killing them I find
that when they come back up and talk to each other I get COMM_FAILURE
errors.
This happens as soon as they start up as they attempt to contact all the
other machines that should be running this service. The weird thing is
that the initiator seems to be okay, but when the receive attempts to
call back to the initiator it dies with a COMM_FAILURE.
To make it more concrete, let's say I have this service running on two
machines, A and B.
1) start service on A
2) service on A attempts to contact B, B is not running yet, fine.
3) start service on B
4) service on B attempts to contact A, A is running and replies.
5) kill service on B
6) start service on B
7) service on B attempts to contact A, A is running and has an operation
invoked on it successfuly by B. A then attempts to invoke an operation
on B and a CORBA.COMM_FAILURE is raised.
If I leave the service on B dead for long enough this problem does not
occur, so I turned tracing on and found that once the service on A gets
to the point where it prints the below message out I can then kill and
restart the service on B and everything works.
--------------------------------------------------------------------
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: sendCloseConnection: to giop:tcp:172.16.69.250:9991 12 bytes
omniORB: Client connection refcount (forced) = 0
omniORB: Client close connection to giop:tcp:172.16.69.250:9991
omniORB: throw giopStream::CommFailure from
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: Server connection refcount = 1
omniORB: Server connection refcount = 0
omniORB: Server close connection from giop:tcp:172.16.69.250:40464
omniORB: Deleting Python state for thread id 1085389744 (thread exit)
omniORB: AsyncInvoker: thread id = 4 has exited. Total threads = 3
--------------------------------------------------------------------
Now, I don't really have any good ideas, but it does strike me that the
line that says:
--------------------------------------------------------------------
omniORB: throw giopStream::CommFailure from
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
--------------------------------------------------------------------
isn't actually throwing anything to the app level at the time, I'm
wondering if this is possibly being held over until I next attempt to
invoke an operation on that same connection? in normal operation this
won't happen because the remote servant will have a different port/IOR
over different invocations but in my case the corbaloc URL doesn't change.
Any thoughts or ideas would be greatly appreciated. I'm sure I can add
some code to work around this but I'd really rather have the system Just
Work(tm)
Thanks,
n
I've used omniORB before and have decided to use it on a new project I
am a part of. I've used the C++ bindings in the past, now I'm
experimenting with Python.
I joined the list last week and it seems fairly low-volume so I'm going
to post without having been around too long.
I think I have some idea of what the underlying problem may be but I'm
not sure, so here goes.
I want to start up one specific service in such a way that I do not need
a bootstrapping service to get hold of it remotely.
My current solution is to provide the ORB with an endpoint, use the
omniINSPOA to house my service and provide a well-known name to it so
that I can construct a corbaloc URL by knowing the hostname, port and name.
This all works fine. I have no problems doing this, it's groovy. I have
code that works.
My problem is when I try to test these services by killing them I find
that when they come back up and talk to each other I get COMM_FAILURE
errors.
This happens as soon as they start up as they attempt to contact all the
other machines that should be running this service. The weird thing is
that the initiator seems to be okay, but when the receive attempts to
call back to the initiator it dies with a COMM_FAILURE.
To make it more concrete, let's say I have this service running on two
machines, A and B.
1) start service on A
2) service on A attempts to contact B, B is not running yet, fine.
3) start service on B
4) service on B attempts to contact A, A is running and replies.
5) kill service on B
6) start service on B
7) service on B attempts to contact A, A is running and has an operation
invoked on it successfuly by B. A then attempts to invoke an operation
on B and a CORBA.COMM_FAILURE is raised.
If I leave the service on B dead for long enough this problem does not
occur, so I turned tracing on and found that once the service on A gets
to the point where it prints the below message out I can then kill and
restart the service on B and everything works.
--------------------------------------------------------------------
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: sendCloseConnection: to giop:tcp:172.16.69.250:9991 12 bytes
omniORB: Client connection refcount (forced) = 0
omniORB: Client close connection to giop:tcp:172.16.69.250:9991
omniORB: throw giopStream::CommFailure from
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: Server connection refcount = 1
omniORB: Server connection refcount = 0
omniORB: Server close connection from giop:tcp:172.16.69.250:40464
omniORB: Deleting Python state for thread id 1085389744 (thread exit)
omniORB: AsyncInvoker: thread id = 4 has exited. Total threads = 3
--------------------------------------------------------------------
Now, I don't really have any good ideas, but it does strike me that the
line that says:
--------------------------------------------------------------------
omniORB: throw giopStream::CommFailure from
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
--------------------------------------------------------------------
isn't actually throwing anything to the app level at the time, I'm
wondering if this is possibly being held over until I next attempt to
invoke an operation on that same connection? in normal operation this
won't happen because the remote servant will have a different port/IOR
over different invocations but in my case the corbaloc URL doesn't change.
Any thoughts or ideas would be greatly appreciated. I'm sure I can add
some code to work around this but I'd really rather have the system Just
Work(tm)
Thanks,
n