[omniORB] OminORB WorkerThread problem

Luc Thevenon

2006-11-30 21:52:19 UTC

Hello,

We use curently OmniORB 3.05, OmniORBpy 1.5 and python 2.2 threading
module (threading.py) to manage thread. We have a problem with the way
the OmniORB WorkerThread is managed, leading sometimes to a deadlock.

After analyzing the problem, here is what we observe:

We have a python process which starts a C++ process and a python thread
waiting for events coming from this C++ running process (other python
threads are also started). For that an OmniORB thread is then started,
and its corresponding WorkerThread object is stored in the active thread
dictionary (method __init__ of class WorkerThread in __init__.py). The
OmniORB thread ID is used as the key insertion.
After a short execution (a few seconds only), the C++ process and the
waiting python thread exit. The OmniORB thread also exits, but its
corresponding WorkerThread object is not removed. It is removed only
after 60 seconds of inactivity by the omnipyThreadScavenger.
The same sequence occurs several times within these 60 seconds: start /
exits another C++ process and waiting python thread with different
parameters.

The problem is that within the 60 seconds, before the WorkerThread
object is removed, the ID of the corresponding OmniORB thread can be
reused by the system when other python thread are started. When this
occurs, the new python thread state overwrite the existing WorkerThread
object in the active dictionary (method __bootstrap of class Thread in
threading.py) since it has the same ID. This thread state is removed
from the active dictionary when the new python thread exits (method
__delete of class Thread in threading.py).
So, after 60 seconds, when the omnipyThreadScavenger tries to delete the
WorkerThread object from the active dictionary (method delete of class
WorkerThread in __init__.py), which has been already overwritten and
removed, it leads to an error since there is no check on the existence
of the dictionary entry (del _thr_act[self.id]).
The consequence is that the _active_limbo_lock which has been acquired
for this operation (_thr_acq) is never released, and when a new python
thread is started, it waits indefinitely for the _active_limbo_lock to
be released which cause a hang in our application.

Can you give me some hints on this issue ?
Thank you,
Luc Thevenon