All of a sudden, your website appears to have stopped working. Pages are taking forever to load. Your website is experiencing a hang!
Hangs are fairly common for production applications, and can be incredibly frustrating to troubleshoot. The main reason for this are:
- They may be happening only sometimes and can be hard to catch.
- They can be caused by complex and interrelated factors that can be difficult to isolate.
In this article, I'll show you how you can systematically isolate and diagnose most hangs in production. You'll need basic knowledge of Microsoft troubleshooting tools, and time.
STEP 1: Is it really a hang?
First, lets define what website hang really means.
An IIS website hangs whenever the it appears to stop serving incoming requests, with requests either taking a very long time or timing out. It's generally caused by all available application threads becoming blocked, causing subsequent requests to get queued (or sometimes by the number of active requests exceeding configured concurrency limits).
Its important to differentiate the following kinds of hangs:
- Full hang. All requests to your application are very slow or time out. Symptoms include detectable request queueing, and sometimes 503 Service Unavailable errors when queue limits are reached.
NOTE: Most hangs do not involve high CPU, and are often called "low CPU hangs". Also, most of the time, high CPU does not itself cause a hang. In rare cases, you may also get a "high CPU hang", which we don't cover here.
Rolling hang. Most requests are slow, but eventually load. This usually occurs before a full hang develops, but may also represent a stable state for an application that is overloaded.
- Slow requests. Only specific URLs in your application are slow. This is not generally a true hang, but rather just a performance problem with a specific part of your application.
Here are 3 "reasonable" early detection signs:
"Http Service Request Queues\MaxQueueItemAge" performance counter increasing. This means IIS is falling behind in request processing, so all incoming requests are waiting at least this long to begin getting processed.
"Http Service Request Queues\ArrivalRate" counter exceeds the "W3WP_W3SVC\Requests / sec" counter for the application pool's worker process over a period of time. This basically implies that more requests are coming into the system than are being processed, and this always eventually results in queueing.
And the best way to detect a hang is: snapshotting currently executing requests. If the number of currently executing requests is growing, this can reliably tell you that requests are piling up ... which will always lead to higher latencies and request queueing.
Most importantly, this can also tell you which URLs are causing the hang, and which requests are queued.
You can view all currently executing requests in InetMgr, by opening the server node, going to Worker Processes, and picking your application pool's worker process:
You can also automate this by using the AppCmd command line tool:
%windir%\system32\inetsrv\appcmd list requests /elapsed:10000
This will show you which requests are executing, optionally longer than the elapsed filter you specified. I recommend an elapsed filter of at least 5 seconds or longer.
If you see multiple requests that are taking a long time to execute AND you are seeing more and more requests begin to accumulate, you likely have a hang. If you DO NOT see requests accumulating, its likely that you have slow requests to some parts of your application, but you do not have a hang.
Detecting a hang reliably is suprisingly difficult. While you can almost always tell when you have a hang by requesting your website externally, detecting it internally can be surprisingly hard.
There are many possible places where a hang can happen, and many possible signs of hangs. Most of these signs are unreliable on their own (e.g. ASP .NET queueing counters), and the reliable ones (executing requests, thread snapshots) are prohibitively expensive to monitor all the time. With LeanSentry, we solved this problem by using progressive hang detection, which starts out with lightweight monitoring of more than a dozen different performance counters ... and then confirms a likely hang with executing request snapshots and the debugger.
STEP 2: Diagnose the hang
Once you confirm the hang, the next step is to determine where its taking place.
It's not IIS (but check it anyway).
IIS hangs happen when all available IIS threads are blocked, causing IIS to stop dequeueing additional requests. This is rare these days, because IIS request threads almost never block. Instead, IIS hands off request processing to an ASP .NET, Classic ASP, or FastCGI application, freeing up its threads to dequeue more requests.
To quickly eliminate IIS as the source of the hang, check:
"Http Service Request Queues\CurrentQueueSize" counter. If its 0, IIS is having no problems dequeueing requests.
- "W3WP_W3SVC\Active Threads" counter. This will almost always be 0, or 1 because IIS threads almost never block. If its significantly higher, you likely have IIS thread blockage due to a custom module or because you explicitly configured ASP .NET to run on IIS threads. Consider increasing your MaxPoolThreads registry key.
Diagnose the hang.
Snapshot the currently executing requests to identify where blockage is taking place.
REQUEST "7000000780000548" (url:GET /test.aspx, time:30465 msec, client:localhost, stage:ExecuteRequestHandler, module:ManagedPipelineHandler) REQUEST "f200000280000777" (url:GET /test.aspx, time:29071 msec, client:localhost, stage:ExecuteRequestHandler, module:ManagedPipelineHandler) ... REQUEST "6f00000780000567" (url:GET /, time:1279 msec, client:localhost, stage:AuthenticateRequest, module:WindowsAuthentication) REQUEST "7500020080000648" (url:GET /login, time:764 msec, client:localhost, stage:AuthenticateRequest, module:WindowsAuthentication)
You can use the resulting list of executing requests to learn A LOT about whats happening, including which URL is causing the blockage, and which requests are queued.
Expert tip #1: identifying requests causing the hang. You can identify which requests are the ones causing the hang because they will be at the front of the list, taking the longest time to execute. They will generally all be stuck in the same module and stage, and often the same URL.
If the hang is being caused by a specific ASP .NET controller or page, the module will say "IsapiModule" (Classic mode) or "ManagedPipelineHandler" (Integrated mode), and the stage will say "ExecuteRequestHandler". The URL should then point to the page/controller responsible.
Expert tip #2: Identifying queued requests. See the block of requests at the bottom of the list? These are the queued requests!
In Integrated mode, these will all have the module/stage corresponding to the first ASP .NET module in the pipeline. This will generally be "Windows Authentication" in "AuthenticateRequest" or sometimes "Session" in "AcquireRequestState".