Tuesday, February 22, 2011

CrashOnAuditFail – Windows Server 2003

A little background.  My coworker and I maintain a business critical intranet web server for about 220 users in our company, as well as other jobs and responsibilities from time to time.  We are also the help desk for the app and most problems in general.

We had just finished eating our lunch when the phones light up with what at first seemed to be a typical network issue.  We have a few trusted users that seem to have a bit more insight than others so we went down stairs to their locations and checked the normal connectivity issues that are the most likely culprits.  This time was different, we (the admins) could still log in, but users got a 404 Error.  After checking several different work stations, we went back to the server.  We checked the System log and there wasn’t much info to be found.  Just that the server had unexpectedly restarted.

SystemError6008

That wasn’t much help so we looked in the Security log and found this error.

SecurityError521

The very limited info provided here proved to be just enough.
It seems that when you have auditing turned on and have set the Security Log to ‘Do Not Overwrite events’ and the log fills up, it will cause the sever to restart. 

SecurityLogProperties

That in itself isn’t that surprising, it’s the fact that when it restarts because the Security log is full, it only allows administrators to log in.   I understand why this happens, after all if it can’t perform auditing on the users' activities, why should it let them access the server.

We all know that Google knows everything, right?  The secret is asking the right question.  This is where a more verbose log error would have come in handy.   We eventually found the answer here. The answer was provided by Backlund. 

The key to this is that when the server restarted it made a registry change and didn’t let us know via a message of some sort. 
So open up Regedit and navigate to the following key.
HKLM\SYSTEM\CurrentControlSet\Control\Lsa
in the CrashOnAuditFail key the value is 2.
Change the value to either 1 or 0 and restart the machine.
Everything should return to normal.  We hope that this can help someone, I wish that I had known about this before this bit of excitement.  While we knew that this wasn’t a résumé updating event, it was still stressful.

No comments:

Post a Comment