Discussion:
[omniORB] Problem with SocketCollection::Select() method
souchaud
2006-12-18 20:39:29 UTC
Permalink
Hello,

when I launched my programm on a cluster and if I use more than 50
nodes, my application crashes sometimes and the following error appear :

$ mpirun -np 76 -x bench_redcorba_grid -ORBInitRef ...
buffer size (MB) = 121.765
redist time (ms) = 3533.69
...
omniORB: Assertion failed. This indicates a bug in the application
using omniORB, or maybe in omniORB itself.
file: SocketCollection.cc
line: 475
info: index < pd_pollfd_n
omniORB: Unexpected exception caught by giopRendezvouser
omniORB: Unrecoverable error for this endpoint:
giop:tcp:192.168.133.10:41480, it will no longer be serviced.


so I added the following lines before the ASSERT in the
SocketCollection::Select() method :
if(index >= pd_pollfd_n)
std::cerr << "idx:" << index << " fd_n:" << pd_pollfd_n << " count:"
<< count << std::endl;
OMNIORB_ASSERT(index < pd_pollfd_n);

Now, when the error occurs I get this message :
...
idx:65 fd_n:65 count:1
omniORB: Assertion failed. This indicates a bug in the application
or :
...
idx:68 fd_n:68 count:1
omniORB: Assertion failed. This indicates a bug in the application


I don't know why it crashes.
I'am using omniORB4.1-rc2, on linux (machines : opteron, xeon) and
launch my programm with lam.

Thanks,
Mathieu Souchaud
Duncan Grisby
2006-12-19 17:54:47 UTC
Permalink
On Monday 18 December, souchaud wrote:

[...]
Post by souchaud
omniORB: Assertion failed. This indicates a bug in the application
using omniORB, or maybe in omniORB itself.
file: SocketCollection.cc
line: 475
info: index < pd_pollfd_n
[...]
Post by souchaud
I'am using omniORB4.1-rc2, on linux (machines : opteron, xeon) and
launch my programm with lam.
Please try omniORB 4.1.0. There was a bug in rc 2 that could well be the
problem you are seeing.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
souchaud
2006-12-21 17:28:45 UTC
Permalink
Post by Duncan Grisby
Please try omniORB 4.1.0. There was a bug in rc 2 that could well be the
problem you are seeing.
I compiled and installed omniORB4.1.0, but the error is still here.
Here is the dump (do you want a higher debug level?) :

(lamboot on 53 machines)
$ mpirun -np 88 -x
REDSYM_DEBUG_LEVEL=1,COLCOWS_DEBUG_LEVEL=1,REDCORBA_DEBUG_LEVEL=1
bench_redcorba_grid sender5000 receiver5000 44 44 5000 5000 2 0 1
-ORBInitRef
NameService=corbaloc::frontale.bordeaux.grid5000.fr:2809/NameService
iter = 2
sender processor size = 44
receiver processor size = 44
sender distribution = 0
receiver distribution = 1
sending buffer strided = 0
receiving buffer strided = 0
nb of data = 565000
local buffer size (KB) = 4414
size(KB) 194219
buffer size (MB) = 189.667
redist time (ms) = 4275.16
.omniORB: Assertion failed. This indicates a bug in the application
using omniORB, or maybe in omniORB itself.
file: SocketCollection.cc
line: 480
info: index < pd_pollfd_n
omniORB: Unexpected exception caught by giopRendezvouser
omniORB: Unrecoverable error for this endpoint:
giop:tcp:192.168.133.10:38204, it will no longer be serviced.

Mathieu Souchaud
Duncan Grisby
2006-12-29 00:26:06 UTC
Permalink
Post by souchaud
Post by Duncan Grisby
Please try omniORB 4.1.0. There was a bug in rc 2 that could well be the
problem you are seeing.
I compiled and installed omniORB4.1.0, but the error is still here.
I don't know what's going wrong. poll() is reporting that some number of
sockets are readable, but omniORB isn't able to find them all. As a
workaround, I have downgraded the assertion to a warning log message.
Please can you try the current CVS version in the omni4_1_develop
branch, and see if it works for you? I would expect you to get some
warning messages, but does it continue to work ok other than that?

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
souchaud
2007-01-02 22:14:34 UTC
Permalink
Post by Duncan Grisby
I don't know what's going wrong. poll() is reporting that some number of
sockets are readable, but omniORB isn't able to find them all. As a
workaround, I have downgraded the assertion to a warning log message.
Please can you try the current CVS version in the omni4_1_develop
branch, and see if it works for you? I would expect you to get some
warning messages, but does it continue to work ok other than that?
I have installed the cvs version. I see the warnings, but my tests are
still working. So it's ok :-) .

cheers
Mathieu Souchaud
Duncan Grisby
2007-01-08 21:06:03 UTC
Permalink
Post by souchaud
Post by Duncan Grisby
I don't know what's going wrong. poll() is reporting that some number of
sockets are readable, but omniORB isn't able to find them all. As a
workaround, I have downgraded the assertion to a warning log message.
Please can you try the current CVS version in the omni4_1_develop
branch, and see if it works for you? I would expect you to get some
warning messages, but does it continue to work ok other than that?
I have installed the cvs version. I see the warnings, but my tests are
still working. So it's ok :-) .
Please can you send me a trace from -ORBtraceLevel 25 -ORBtraceThreadId
1 that shows the warning messages appearing? That might reveal more
about what's happening.

Cheers,

Duncan.
--
-- Duncan Grisby --
-- ***@grisby.org --
-- http://www.grisby.org --
Loading...