How-To: Crash diagnostics

LeanSentry's Crash diagnostics detect and help you fix website and program crashes in your environment.

A crash takes place when a process serving an application pool or another program abruptly terminates due to an unrecoverable error. The crash causes the process to immediately abort all processing, without saving data, or executing the expected shutdown logic.

Crashes are unacceptable in a production environment, because they often lead to:

  • Data loss/corruption. The process does not save any data being processed, and can leave data in an corrupted or unrecoverable state.

  • Aborted requests/transactions. Any requests being processed are never completed, and connections are terminated without error.

  • State loss. Any application, cache or session state is lost.

  • Orphaned resource locks. If the process is holding locks on external resources like files or databases, it can leave those locks in place - preventing the application on this and other servers from working.

  • Permanent downtime. If the crash is persistent, it can lead to a total downtime for your application, either by consistently tearing down the process and/or triggering the Rapid Fail Protection feature.

IMPORTANT: It is critical that any production crashes are investigated and fixed immediately to prevent serious problems.

How it works

LeanSentry automatically detects crashes in IIS application pools and other programs.

Once a crash is detected, we analyze the crashing process to identify the precise root cause of the crash. We'll then provide this information to you so you can fix it.

Crash diagnostics are incredibly lightweight: they only execute if a process has crashed, and have zero effective overhead on your server.

Automatic insights

When a crash is detected, we'll automatically notify you. The email will provide the summary information on what caused the crash, be it an unhandled exception, stack overflow, and so on.

We'll also let you know if we detect that the crash is being caused by a new cause or if we detect other interesting behavior.

Crash insights let you know when a crash is detected and what caused it.

Simply click the link in the email to view the complete crash information so you can fix the cause of the crash.

Troubleshooting crashes

You can access the crash report directly from the Crash insight email, by clicking the crash bubble in the main dashboard, or by going to the "Crash diagnostics" link in the diagnostic sidebar.

Each crash report identifies the root cause of the crash, e.g.:

  1. Unhandled exception.

  2. Stack overflow.

  3. Access violation or other error.

It then provides the relevant details including the full stack trace of the error, and source code information if available.

Your developer can then immediately pinpoint the crash location and deploy a fix to prevent future crashes.

You can also quickly view and search all crashes in your environment, and filter crashes by program, application pool, or causes of the crash.

How to fix crashes (for Developers)

While crashes are a serious problem, they are usually easy to fix once you know the location of the crash.

Use the stack trace LeanSentry provides to quickly locate the crash location in your code. Then, apply the appropriate fix depending on the crash type.

For unhandled exceptions, you can stop a crash from happening by making sure the exception is handled instead of letting it bubble out of your code. This is necessary whenever executing code on new threads, timers, ThreadPool.QueueUserWorkItem callbacks, and so forth.

While you can also change the CLR's unhandled exception policy to avoid tearing down your process on unhandled exceptions, but we DO NOT recommend this.

For stack overflows, prevent excessive recursion in your code by correctly identifying a stop condition. Other approaches here include limiting recursion depth by using a counter incremented with each recursive call. If your code requires very deep recursion, "unroll" it by turning a recursive algorithm into a loop/stack based algorithm.

Other errors. A wide variety of system exceptions can also cause process termination, such as Access violations (e.g. passing a null pointer to a native function) or the ERROR_IN_PAGE error when using memory mapped files. To address these crashes, you may need to restructure your code to avoid situations leading to these errors. If you are having trouble due to one of these errors, contact us for more ideas on what you can do.

Need help?

Have a question or feedback on Hang diagnostics? Email us at support@leansentry.com.

More resources

Want to learn the best techniques for managing your IIS web server? Join our how-to newsletter.
Want to get the best tool for troubleshooting and tuning your web apps? .

Build your skills

Get performance best practices straight from our IIS & ASP.NET experts:

How-to newsletter: 1-2 emails/week with expert IIS & ASP.NET tips.