Author Topic: Windows memory / handle leaks (Read 8659 times)

Graeme · « **on:** March 17, 2014, 07:23:26 AM »

We recently had a customer with a Windows 7 PC that repeatedly crashed every couple of weeks due to memory and handle leaks in the Windows audio driver as described here.
http://support.microsoft.com/kb/981013
http://support.microsoft.com/kb/2670667

Does anyone have any knowledge or experience of what can happen when Windows runs out of memory or handles? Can just about anything occur e.g. random corruption to the hard drive?

If an application is continually writing data to a log file which is in a database and the application and database engine correctly check for memory allocation failure and API call errors, is it still possible that the database can become corrupted when Windows runs out of memory or handles?

chrisant · « **Reply #1 on:** March 17, 2014, 10:49:01 PM »

Quote from: Graeme on March 17, 2014, 07:23:26 AM

We recently had a customer with a Windows 7 PC that repeatedly crashed every couple of weeks due to memory and handle leaks in the Windows audio driver as described here.
http://support.microsoft.com/kb/981013
http://support.microsoft.com/kb/2670667

How did the customer respond when it occurred? If the customer cycled the power then that's catastrophic hardware failure and (almost) all bets are off. NTFS journaling should keep the directory structure intact, and possibly (probably?) the file allocation tables intact. But pending file content may not have been committed (even if it was committed by the OS and the hard drive returned success, the hard drive firmware may have deferred the actual write for performance reasons and it may still have gotten lost -- yes, hard drives do that these days to boost their throughput numbers).

Quote from: Graeme on March 17, 2014, 07:23:26 AM

Does anyone have any knowledge or experience of what can happen when Windows runs out of memory or handles? Can just about anything occur e.g. random corruption to the hard drive?

"Random" corruption seems very unlikely to happen at the OS level. I've certainly never encountered that on any combination of hardware or machines I've used or serviced that ran out of system resources. Which is around 20+ times on various combinations. I've seen several times where running out of system resources resulted in exposing app bugs, though they were sometimes not entirely obvious to track down. And I've seen many, many cases where an app hits out of disk or out of storage errors and didn't have proper transactional integrity and got stuck in a corrupted state (when the OS hadn't run out of resources -- e.g. on Win32 x86 where the OS still has more than 1.5GB shared address space available to it but an app blew out its private 2GB address space, or filled up the disk).

While it's possible for the OS to have a bug there, in my experience it's _always_ been mistakes in app code (or driver code!), not in core OS code.

Quote from: Graeme on March 17, 2014, 07:23:26 AM

If an application is continually writing data to a log file which is in a database and the application and database engine correctly check for memory allocation failure and API call errors, is it still possible that the database can become corrupted when Windows runs out of memory or handles?

"correctly check" seems like a key phrase here. Does any of the error handling path make any calls of any kind that could possibly ever fail? Probably yes. And if so, then the error handling code may not be fully robust in the presence of certain types of errors. And that could potentially lead to unpredictable things going wrong.

I'd recommend to examine the nature of the corruption, cross reference it with app code, and reverse engineer how it could occur. Pay extra attention to sequences of writes, as if a sequence of writes gets interrupted by an error then the storage may be left in an inconsistent state if the error handling and transactional logic isn't just right.

That might not be the cause here, but it's by far the most common cause in my experience. I think my experience is greater than average, but this certainly isn't a core area of expertise for me.

Hope that helps!

Graeme · « **Reply #2 on:** March 18, 2014, 12:44:23 PM »

Thanks. I didn't really know that hard drives could defer writing. Now I see that they can also re-order disk writes. For NTFS journaling to work I would have thought there would need to be a way of forcing the disk to be flushed. Apparently the database involved has a transaction mechanism, including journaling.

I don't know if the customer cycled the power or not but I know that on one occasion, Windows Error Recovery fired up on restart. The software involved is from a third party who did most of the investigation. The application is a Delphi program and I suspect it would be quite hard to determine whether all the error handling is correct. From what you say I suspect it's not. I might ask them to investigate but not spend too much time on it as running out of memory / handles should be fairly rare.

Why are almost all bets off if the power is recycled (or there's a mains fail)? A couple of weeks ago we had a storm here and a brownout occurred resulting in one of the Windows 7 PCs here becoming unbootable which was a surprise. We had to re-install the O.S.

Thanks.

chrisant · « **Reply #3 on:** March 19, 2014, 06:11:34 AM »

Quote from: Graeme on March 18, 2014, 12:44:23 PM

Why are almost all bets off if the power is recycled (or there's a mains fail)? A couple of weeks ago we had a storm here and a brownout occurred resulting in one of the Windows 7 PCs here becoming unbootable which was a surprise. We had to re-install the O.S.

By "all bets are off" I just mean that if power is lost then even writes that the hard drive reported as having been written may not actually reach persistent storage, and if power is lost in the middle of a hardware write then the inconsistency is even greater. Losing power means it's unpredictable what writes will get completely or partially lost.

jporkkahtc · « **Reply #4 on:** April 16, 2014, 09:19:05 PM »

Windows can get to be completely unresponsive for very long periods of time when one application starts consuming all available memory. I accidentally did this recently with a script I wrote -- it started using many gigs of RAM. I tried being patient, but eventually I gave up and power-cycled the PC.
Unless your customer is unusually patient it is very likely they forced power off or reset.

WRT error handling: A simple technique that doesn't get used nearly enough: Make your application robust in the event of unexpected power-fail or termination. On out-of-memory errors, simply terminate immediately. Trying to handle OOM errors is nearly impossible to get right, bloats code with tons of error handling that is almost surely untested and buggy, the system is likely in a horrible condition if you actually got an OOM,
and restarting the process will almost always fix the problem (assuming it is your app causing the problem).

Way back in the days of Vista when I was working on a disk intensive database-like application we did a bunch of power-fail testing. Windows (and the DB we used) handled this pretty well.

See also: FlushBuffersFile - which is *slow* and will force data to disk - I think even on drives that lie to Windows about caching. http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=vs.85).aspx

See also: http://community.slickedit.com/index.php/topic,3758.15.html
Where Slick used to call FlushBuffersFile(), but it was causing real perf problems for interactive use at least.

Graeme · « **Reply #5 on:** April 17, 2014, 12:21:06 AM »

Thanks, the database is using FlushFileBuffers and I had looked into it but with SATA drives there's no guarantee it will do anything and it doesn't actually tell you when the operation is complete. It's interesting that NTFS on Windows 8 has switched from "forced unit access" to using FlushFileBuffers.

http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/
http://workinghardinit.wordpress.com/tag/forced-unit-access/

So thanks to the variability of hardware it seems to be quite difficult to achieve data integrity on SATA drives, even for Windows NTFS. As I mentioned below, we had to reinstall Windows 7 after a brownout recently but who knows what might have caused that. I guess the big benefit of NTFS etc is that it does guarantee file system integrity when an application crashes but not so much when the power fails or the operating system itself is malfunctioning. I wonder if the power off button on a PC gives the software time to sort itself out if it's halfway through a disk write - I've done that a few times.

chrisant · « **Reply #6 on:** April 17, 2014, 04:59:35 PM »

Quote from: Graeme on April 17, 2014, 12:21:06 AM

I wonder if the power off button on a PC gives the software time to sort itself out if it's halfway through a disk write - I've done that a few times.

If the power button is handled in software and triggers Standby or Hibernate then yes, it will be ok. But if you hold the power button and override it to do a true electrical power cutoff then no, that's the same as unexpected power loss and things can go wrong. Some SSD's used to (still do?) have a high chance of being severely corrupted on power loss (incl. forced via holding the power button).

Author Topic: Windows memory / handle leaks (Read 8659 times)

Graeme

Windows memory / handle leaks

chrisant

Re: Windows memory / handle leaks

Graeme

Re: Windows memory / handle leaks

chrisant

Re: Windows memory / handle leaks

jporkkahtc

Re: Windows memory / handle leaks

Graeme

Re: Windows memory / handle leaks

chrisant

Re: Windows memory / handle leaks