Hi, Shrewd!        Login  
Shrewd'm.com 
A merry & shrewd investing community
Best Of FK | Best Of | Favourites & Replies | All Boards | Post of the Week!
Search FK
Shrewd'm.com Merry shrewd investors
Best Of FK | Best Of | Favourites & Replies | All Boards | Post of the Week!
Search FK


Investment Strategies / Falling Knives
Unthreaded | Threaded | Whole Thread (3) |
Post New
Author: WatchingTheHerd HONORARY
SHREWD
  😊 😞

Number: of 230 
Subject: 3 Minute Page Loads
Date: 04/18/2024 4:32 PM
Post Reply | Report Post | Recommend It!
No. of Recommendations: 10
Hi Manlobbi:

On 4/18/2024 at 4:00pm EDT, I noticed a problem which seems to recur not unfrequently on the site. It was experiencing similar systems earlier in the day as well.

Clicks to view pages were taking extremely long times to load. So long I sat down with a stopwatch to time them. Pages were taking consistently 3 minutes -- exactly -- to load. Then at about 4:09pm one page took 2:14 seconds to display a lower level web server error stating a downstream process was unavailable, at which point any clicks since then have been the normal BLAZING fast.

Based on prior experience with such things, the symptoms above seem to point to:

* one back-end server behind a load balancer locking up due to resources, lack of DB connections, etc.
* the load balancer is failing to detect that lockup and continuing to deliver traffic to it
* an upstream client or the client in the browser is using a default connection open timeout or response timeout limit of 3 minutes so it is WAITING that time before retrying or returning a failure

It doesn't appear that the root cause is due to underlying database horsepower, tuning or indexing because the system is normally BLAZING fast and reverts to BLAZING fast after some other gremlin clears. That would suggest a few things worth checking, depending upon the platforms used in the back-end of the system.

* core web service leaking open connections to the DB, eventually running out?
* timeout limit of the load balancer in front of that DB web service set at default value or set too high?
* health check on individual servers of the DB web service not configured or configured to only test the ability to open a connection rather than get a successful HTTP response?
* same type of problem with outer web services called by the browser client which then call these internal web services?

Different languages have different default behavior when wrapping database queries and many libraries claim that they automatically release DB connections back to a pool but that handling may not always work in some exception handling conditions, which can produce a slow leak of open connections leading to a web service instance which can ACCEPT work but cannot COMPLETE it since it is out of open DB connections.

Lowering socket timeout settings is a quick way to flow traffic around a zombie node but most libraries default to a 3 or even 6 minute timeout which is WAY too long for a service used for interactive traffic where users expect responses in 5-10 seconds.

The health check referenced above could be the health check performed by the load balancer to verify a member of the pool should continue receiving traffic or could be the health check performed by your hosting platform (Kubernetes?) to determine if a worker virutal machine is healthy or should be killed and replaced with a new instance.


WTH
Print the post


Author: WatchingTheHerd HONORARY
SHREWD
  😊 😞

Number: of 230 
Subject: Re: 3 Minute Page Loads
Date: 05/17/2024 12:15 PM
Post Reply | Report Post | Recommend It!
No. of Recommendations: 3
Any luck in finding / eliminating the culprit here? I continue to see this pattern at different parts of the day. The system is exhibiting the problem right now as well. As of 12:13pm EDT 5/17,

* displaying this board index took 3:48 seconds to load
* displaying this prior post took 3:23 seconds to load
* displaying this reply form took 0:36 seconds to load


WTH
Print the post


Author: bighairymike   😊 😞
Number: of 671 
Subject: Re: 3 Minute Page Loads
Date: 05/17/2024 1:43 PM
Post Reply | Report Post | Recommend It!
No. of Recommendations: 0
I was also bogged down for about an hour this morning. Seems OK now.
Print the post


Post New
Unthreaded | Threaded | Whole Thread (3) |


Announcements
Falling Knives FAQ
Contact Shrewd'm
Contact the developer of these message boards.

Best Of FK | Best Of | Favourites & Replies | All Boards | Followed Shrewds