Discussion:
[omniORB] omni::createObjRef thread safety
Luke Deller
2009-10-27 12:27:55 UTC
Permalink
Hi,

We've been getting an occasional but recurring crash under load in the omni::createObjRef function in omniInternal.cc. It looks to me like a thread safety issue. I have some some confidence that the error is indeed in omniORB rather than our own code because the attached patch to omniORB seems to have stopped the crash from occurring.

This is using a snapshot of omniORB (and omniORBpy) from 20080908, but I think that there have been no relevant changes since then.
From analysing the crash dump files (Windows "mini-dump" files), I can see that the crash is caused by memory corruption in the "omniIdentity" object pointed to by the "id" local variable, resulting in an access violation on line 1064 or 1069 of omniInternal.cc (varies slightly according to the nature of the corruption)
1064: omniObjRef* objref = pof->newObjRef(ior, id);
1065: if( target_intf_not_confirmed ) objref->pd_flags.type_verified = 0;
1066:
1067: {
1068: omni_optional_lock sync(*internalLock, locked, locked);
1069: id->gainRef(objref);
1070: }

The call stack is always as follows:

[1] omniORB413_vc9_rt!omni::createObjRef
[2] omniORB413_vc9_rt!omniObjRef::_unMarshal
[3] omniORB413_vc9_rt!CORBA::Object::_unmarshalObjRef
[4] omniDynamic413_vc9_rt!CORBA::Object_Member::operator<<=
[5] omniDynamic413_vc9_rt!omni::fastCopyUsingTC
[6] omniDynamic413_vc9_rt!omni::tcParser::copyStreamToStream
[7] omniDynamic413_vc9_rt!CORBA::Any::operator<<=
[8] _omniOTS!CosTransactions::PropagationContext::operator<<=
[9] _omniOTS!GetTxnContext
[10] omniORB413_vc9_rt!omni::GIOP_S::handleRequest
[11] omniORB413_vc9_rt!omni::GIOP_S::dispatcher

See that this is called from the "serverReceiveRequest" interceptor installed by omniOTS (which would be called very often from multiple threads).

It is trying to unmarshal the CosTransactions::PropagationContext struct. Here is the IDL (note that the implementation_specific_data field in CosTransactions::PropagationContext contains an omniOTS::SpecificData struct).

module CosTransactions { // from standard OMG IDL
struct PropagationContext {
unsigned long timeout;
TransIdentity current;
sequence <TransIdentity> parents;
any implementation_specific_data;
};
};
module omniOTS { // from omniOTS source code
struct SpecificData {
octet magic[7];
short version_major;
short version_minor;
CosTransactions::Control ctrl;
};
};

I can see from inspecting variables in the crash dump that it is trying to unmarshal the final CosTransactions::Control object reference. I can't tell you yet whether this object is in-process or not - either case could arise.

In the omni::createObjRef function I notice that if locked=0 then there is a gap during which omni::internalLock is NOT held, after "omniIdentity* id" has been obtained from omni::createIdentity but before it is used (at which point I am seeing that it is occasionally corrupted). My theory is that omni::createIdentity is returning a valid object but that the object is deallocated or otherwise rendered invalid by another thread during this gap. Is this possible?

See the attached patch which seems to stop the crash happening. It is not an ideal solution but I think it does support my theory of what the problem is?

Regards,
Luke.

**********************************************************************************************
Important Note
This email (including any attachments) contains information which is confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, distribute or copy this email. If you have received this email in error please notify the
sender immediately and delete this email. Any views expressed in this email are not necessarily the views of IRESS Market Technology Limited.

It is the duty of the recipient to virus scan and otherwise test the information provided before loading onto any computer system.
IRESS Market Technology Limited does not warrant that the information is free of a virus or any other defect or error.
**********************************************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omni-objreflocking.patch
Type: application/octet-stream
Size: 493 bytes
Desc: omni-objreflocking.patch
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20091027/59b80382/omni-objreflocking.obj
Duncan Grisby
2009-12-24 15:59:44 UTC
Permalink
Post by Luke Deller
We've been getting an occasional but recurring crash under load in the
omni::createObjRef function in omniInternal.cc. It looks to me like a
thread safety issue. I have some some confidence that the error is
indeed in omniORB rather than our own code because the attached patch
to omniORB seems to have stopped the crash from occurring.
[...]
Post by Luke Deller
I can see from inspecting variables in the crash dump that it is trying
to unmarshal the final CosTransactions::Control object reference. I
can't tell you yet whether this object is in-process or not - either
case could arise.
Hi Luke,

Sorry for the long delay in looking at this. The solution of holding the
lock around the whole createObjRef call will cause problems in other
scenarios. Can you try this patch instead?

I believe the problem is that you are unmarshalling a reference to a
local object. In that situation, there is a window of opportunity for
the object to be deactivated and the identity object to be released
before it is used. The attached patch simply holds an additional
reference to the identity, which should prevent that situation. Does it
fix the problem for you? If so, I'll check it in.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --

-------------- next part --------------
A non-text attachment was scrubbed...
Name: omniInternal.patch
Type: text/x-patch
Size: 1828 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20091224/6622c919/omniInternal.bin
Loading...