Serguei Kolos
2008-07-05 21:34:04 UTC
Hello
I have an application which is acting as router of oneway messages - it
receives them from many clients
and then forwards them to several subscribers at high rate using small
timeout value (10 milliseconds).
The timeout is so small to make the router more responsive to its
clients. If the forwarding request times out
then the message is placed in the buffer and the router has dedicated
thread which is trying to
re-send this message.
What I have noticed is that on slow subscribers the omniORB is creating
new threads rapidly and very
soon gets to the limit. Investigating the problem more deeply I have
found that this happens because the
omniORB on the router side creates new Strand object if an attempt to
send request timed out. When
new Strand object is created it opens new connection to the receiver.
With the small timeout this happens
much more rapidly then destruction of the old connections (and threads).
Just as a proof of principal I have modified the giopStream::errorOnSend
function in the following way
if (rc == 0) {
// Timeout.
// We do not use the return code from the function.
// pd_strand->state(giopStrand::DYING); // this is the old line
pd_strand->state(giopStrand::TIMEDOUT); // this 2 lines
((GIOP_C*)this)->state( IOP_C::Idle ); // are the new ones
used to prevent creation of new Strand object
retry = 0;
minor = TRANSIENT_CallTimedout;
}
and the issue disappeared - the number of threads was kept constant on
the subscriber side. It seems that this way
the old Strand is reused after timeout and no new strands are created.
But then I realized that the fix by itself
is incorrect since it caused problems in some other applications.
Can you please suggest a proper fix for that issue.
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6) with gcc 3.4.4.
Subscribers are running in the thread per
connection mode with the maxServerThreadPerConnection = 20
Cheers,
Sergei
PS: Here is a fragment of the receiver output running with the traceLevel 10
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43601 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 295 has started. Total threads = 295
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43602 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 296 has started. Total threads = 296
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43603 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 297 has started. Total threads = 297
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43604 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 298 has started. Total threads = 298
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43605 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 299 has started. Total threads = 299
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43606 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 300 has started. Total threads = 300
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43607 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 301 has started. Total threads = 301
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43608 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 302 has started. Total threads = 302
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43609 because
of this rule: "* unix,ssl,tcp"
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43610
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43612
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43614
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43615
I have an application which is acting as router of oneway messages - it
receives them from many clients
and then forwards them to several subscribers at high rate using small
timeout value (10 milliseconds).
The timeout is so small to make the router more responsive to its
clients. If the forwarding request times out
then the message is placed in the buffer and the router has dedicated
thread which is trying to
re-send this message.
What I have noticed is that on slow subscribers the omniORB is creating
new threads rapidly and very
soon gets to the limit. Investigating the problem more deeply I have
found that this happens because the
omniORB on the router side creates new Strand object if an attempt to
send request timed out. When
new Strand object is created it opens new connection to the receiver.
With the small timeout this happens
much more rapidly then destruction of the old connections (and threads).
Just as a proof of principal I have modified the giopStream::errorOnSend
function in the following way
if (rc == 0) {
// Timeout.
// We do not use the return code from the function.
// pd_strand->state(giopStrand::DYING); // this is the old line
pd_strand->state(giopStrand::TIMEDOUT); // this 2 lines
((GIOP_C*)this)->state( IOP_C::Idle ); // are the new ones
used to prevent creation of new Strand object
retry = 0;
minor = TRANSIENT_CallTimedout;
}
and the issue disappeared - the number of threads was kept constant on
the subscriber side. It seems that this way
the old Strand is reused after timeout and no new strands are created.
But then I realized that the fix by itself
is incorrect since it caused problems in some other applications.
Can you please suggest a proper fix for that issue.
I'm using omniORB 4.0.7 on SLC4 Linux (kernel 2.6) with gcc 3.4.4.
Subscribers are running in the thread per
connection mode with the maxServerThreadPerConnection = 20
Cheers,
Sergei
PS: Here is a fragment of the receiver output running with the traceLevel 10
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43601 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 295 has started. Total threads = 295
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43602 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 296 has started. Total threads = 296
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43603 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 297 has started. Total threads = 297
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43604 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 298 has started. Total threads = 298
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43605 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 299 has started. Total threads = 299
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43606 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 300 has started. Total threads = 300
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43607 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 301 has started. Total threads = 301
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43608 because
of this rule: "* unix,ssl,tcp"
omniORB: AsyncInvoker: thread id = 302 has started. Total threads = 302
omniORB: Accepted connection from giop:tcp:137.138.xx.yyy:43609 because
of this rule: "* unix,ssl,tcp"
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43610
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43612
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43614
omniORB: Exception trying to start new thread.
omniORB: Cannot create a worker for this endpoint:
giop:tcp:137.138.xx.yyy:43406 from giop:tcp:137.138.xx.yyy:43615