souchaud
2006-12-18 20:39:29 UTC
Hello,
when I launched my programm on a cluster and if I use more than 50
nodes, my application crashes sometimes and the following error appear :
$ mpirun -np 76 -x bench_redcorba_grid -ORBInitRef ...
buffer size (MB) = 121.765
redist time (ms) = 3533.69
...
omniORB: Assertion failed. This indicates a bug in the application
using omniORB, or maybe in omniORB itself.
file: SocketCollection.cc
line: 475
info: index < pd_pollfd_n
omniORB: Unexpected exception caught by giopRendezvouser
omniORB: Unrecoverable error for this endpoint:
giop:tcp:192.168.133.10:41480, it will no longer be serviced.
so I added the following lines before the ASSERT in the
SocketCollection::Select() method :
if(index >= pd_pollfd_n)
std::cerr << "idx:" << index << " fd_n:" << pd_pollfd_n << " count:"
<< count << std::endl;
OMNIORB_ASSERT(index < pd_pollfd_n);
Now, when the error occurs I get this message :
...
idx:65 fd_n:65 count:1
omniORB: Assertion failed. This indicates a bug in the application
or :
...
idx:68 fd_n:68 count:1
omniORB: Assertion failed. This indicates a bug in the application
I don't know why it crashes.
I'am using omniORB4.1-rc2, on linux (machines : opteron, xeon) and
launch my programm with lam.
Thanks,
Mathieu Souchaud
when I launched my programm on a cluster and if I use more than 50
nodes, my application crashes sometimes and the following error appear :
$ mpirun -np 76 -x bench_redcorba_grid -ORBInitRef ...
buffer size (MB) = 121.765
redist time (ms) = 3533.69
...
omniORB: Assertion failed. This indicates a bug in the application
using omniORB, or maybe in omniORB itself.
file: SocketCollection.cc
line: 475
info: index < pd_pollfd_n
omniORB: Unexpected exception caught by giopRendezvouser
omniORB: Unrecoverable error for this endpoint:
giop:tcp:192.168.133.10:41480, it will no longer be serviced.
so I added the following lines before the ASSERT in the
SocketCollection::Select() method :
if(index >= pd_pollfd_n)
std::cerr << "idx:" << index << " fd_n:" << pd_pollfd_n << " count:"
<< count << std::endl;
OMNIORB_ASSERT(index < pd_pollfd_n);
Now, when the error occurs I get this message :
...
idx:65 fd_n:65 count:1
omniORB: Assertion failed. This indicates a bug in the application
or :
...
idx:68 fd_n:68 count:1
omniORB: Assertion failed. This indicates a bug in the application
I don't know why it crashes.
I'am using omniORB4.1-rc2, on linux (machines : opteron, xeon) and
launch my programm with lam.
Thanks,
Mathieu Souchaud