I have a computer that I built myself at least three years ago. Since March it has suddenly developed a habit of freezing and becoming unresponsive. The only thing I’ve changed was to plug in a second monitor. The problem has occurred mostly if I am using an online conference tool (both Microsoft Teams and UberConference, but never had a problem in Zoom or Slack). I use Chrome for those but Teams has its own app and I also crashed using UberConference in Edge. It has also occurred at random times. Once I walked away from the computer with nothing actively running (no user applications were actively doing anything [I guess Chrome could have been busy with something] and Excel was the active window) and when I went back the next morning it had frozen.
This is almost always a freeze, where the screen image freezes but the keyboard and mouse become unresponsive. In two cases I got a blue screen that said it was collecting information and restart, but it was frozen at that point and never restarted. I think the message was NMI_HARDWARE_FAILURE but I cannot find my record of it now.
I have inspected the Windows event log and there are no events logged at the time of the freeze that indicate any problem.
I have tried the following things
Unplug the second monitor
Remove the video card and run on the on-board video
Unplug my webcam
Check drivers and update as needed for monitors, webcam, headset
Unplugged and reseated memory chips
Other USB peripherals that I have not unplugged because they were not active at the of time of any of the crashes:
Multi card reader
Epson scanner
They are connected through a powered Belkin USB hub.
I also have a Microsoft Natural keyboard and Logitech wireless mouse plugged directly into the machine’s USB ports. The only other thing I have not unplugged is a Logitech headset because I need to use it for online calls. I have had that headset longer than I’ve had this computer.
The failures continue.
What should I check next?
I am on the verge of giving up and just rebuilding with a new motherboard but the prospect of a clean OS install and reinstalling all my programs is overwhelming.
Couple of thoughts: The blue screen and having a second monitor makes me think this is a video driver issue. I know you said you checked the drivers, but that’s where I’d start. If you’re sure the driver for the video card (I assume your rig has a video card) is fully updated, maybe look into replacing it.
Next, and this is where my mind first went before I made it all the way through your post. Make sure all the vents and heatsinks are clean. Freezing/crashing/rebooting can be a sign of overheating. The fact that this seems to sometimes happen in the middle of the night makes me unsure of this theory. But if you built this computer on your own, this’ll be a 10 minute job (less if it’s not dirty), and it probably needs to be done anyway.
It seems odd that there’s nothing in the event log, especially after a blue screen, but, I’ve only worked with that on a handful of occasions, so I only have a very basic understanding of it.
The last thing I can think of, and this is a long shot…plug your USB devices into different ports. I’ve seen the most bizarre problems pop up that could be fixed by moving things to different ports. Something like having a USB 3.0 device in a USB 2.0 port causing intermittent issues, I get that. But, for example, I had a problem at work where a single program wasn’t correctly populating one specific thing. Some searching on the internet let me to move the mouse’s bluetooth dongle to a different port and everything started working again. IIRC, the working theory was that the USB port was using the same IRQ as the program when it requested that bit of info. Really strange and a really long shot. But an easy thing to check.
The last thing, and maybe one of the first things you should check (or check next) is that none of your devices are falling asleep. Anytime I have issues that pop up after I’ve been away from the computer for more than a few minutes is to go through the device manager and make sure things like the video card, NIC, mouse, keyboard etc don’t have the ‘allow the computer to turn off this device’ box checked. Also, you can check your power management and make sure that it doesn’t sleep or hibernate. Keep it all wide awake, all the time. Maybe let your monitor shut off if you’d like. But at least while troubleshooting, kill as much power management as you can.
I have an NVIDIA GeForce GTX 1650. I updated drivers, and still got crashes. I removed it from the box, and now I’m running on the Intel graphics set built into the motherboard, and I still get crashes.
I have done this. A couple of weeks ago I blew out the case with canned air and vacuumed. I also run HWInfo64 and log CPU temps and they occasionally run up to 100°C but not before a crash.
I am trying to leave them unplugged entirely except when I need them to see if that shows any pattern.
[quote]check (or check next) is that none of your devices are falling asleep.
[/quote]
Never occurred to me to check that. I will have a look.
I have Windows Defender and Malware Bytes checking things. The only thing that has come up is a false positive when Malware Bytes identified Excel.exe as a trojan.
I haven’t tried that yet. A while back I had a cheap USB hub that was leaking back voltage into the port but wasn’t causing crashes. I’ll try unplugging the one I have.
Sigh. Another crash overnight. I had the sensor monitor running and the last entry was about 3:30 AM.
The CPUs were running at 95% which makes no sense to me. I don’t know of any way to log which processes were using how much CPU.
Temps were about 90°C. No errors in the System logs, leading up to the crash, but there are tons of other logs in Event Viewer and I’m not sure how to wade through them all. WhoCrashed couldn’t find a dump file even though I had configured it as directed. Both monitors were off due to power saving settings.
I think this “freezing” crash is kind of an involuntary crash, as opposed to a blue screen where the system halts in response to something.
My next suggestion is to go through the task manager and kill everything that isn’t required for it to just sit idle for a day or so or maybe even boot in safemode and let it sit overnight. Check your start up programs as well to make sure something isn’t starting on it’s own that you’re not aware of.
Let’s suppose that this is a hardware problem. Because it just freezes, I am wondering if there is something that happens completely outside of the OS and running processes. I don’t understand what can cause the system to completely unresponsive at the hardware level (i.e., hitting the NumLock and Function Lock keys won’t turn the LED lights off on the keyboard) and yet leave a snapshot of the display up on the screen.
How would I diagnose memory failures, bus faults, or other fatal hardware faults that could cause that kind of failure mode?
Well, I used to work in a DC with about 4K machines, and a lot of them were generic desktops pushed into being servers. They’d run for a few years, but they’d develop weird problems as time went on.
In the case of a machine that wasn’t responsive to a keyboard (num lock lights would go on/off, but no monitor response), but was on (sometimes still responding to things like ping) it was usually the motherboard that was the issue. You could almost always find a leaking or swollen cap on the boards that were behaving that way. Since 80% of them ran Linux, it was simple to swap the board out and go about your life.
But! These were boxes that normally had a dedicated PS/2 bus for the keyboards. So, we weren’t waiting for the USB bus to enumerate the keyboard when we plugged it in. When USB keyboards became the norm, this sort of troubleshooting became more difficult, since you’re depending on the operating system to do its bit. If its shot due to a software problem, it’s going to look largely the same as a hardware issue.
So, since I suspect any machine built in the last few years is going to lack a PS/2 bus, you’re in the latter group. I’d open up the case and check for any capacitors that are swollen and leaking, but it’s going to be hard to diagnose a hardware/bus issue if the system isn’t writing anything to the logs.
As an alternative, could you try booting it off a Linux live CD for a day or so and see if the problem persists? That’d at least do a half assed job of eliminating OS issues causing it.
At some point troubleshooting becomes not worth it.
Consider a full re-install of Windows (blow the previous install away completely). Make sure you have your license key printed out (if there is a problem contact MS support…surprisingly I have found them to be really helpful and I have even gotten a new key when they probably should have told me to piss off…YMMV).
I know this is a pain getting all of your programs back on but, honestly, it is not horrible (go watch TV while stuff downloads). It’ll take several hours but it is akin to cooking chili…you only need to attened to it every 30 minutes or so.
Before you do that I would encourage you to download all drivers you need to a thumb-drive. Mainly all the motherboard stuff and your video card. Windows 10 is surprisingly good at getting most of those but this doesn’t hurt if you want to be safe.
Also, be sure you save any work you have on the PC to a thumb-drive or to the cloud (cloud is better but slower).
Frankly, the worst part for me when I re-build is downloading all of my PC games again.
When done your PC should be in as good-shape as is possible. It is a worthwhile thing to do.
This is also really easy these days. Just download a Windows 10 installer to a thumb drive, boot to that thumb drive and install from there. Easy-peasy. (IIRC the thumb-drive needs to be at least 4GB in size…maybe 8GB but I think 4GB will do it…make sure the drive is completely empty before you put Windows 10 install on it).
One last thing…on a new install Ninite is freaking awesome. It has all sorts of free programs that many find useful. Check the boxes for the things you want and it downloads and installs them all in one easy step.
I can’t express how great it is when you are re-building a PC: https://ninite.com/
Some years ago I irretrievably borked the touchpad on my Toshiba laptop during the Win7 to Win8 upgrade. I got every driver Toshiba’s website had for my model. But that didn’t include the one for the integral touchpad. Their live tech support was useless; they were simply human tour guides to whatever was on the website. Oh well.
I do PC/server support for a living (professionally). I have done so since the early 90’s. I know your pain.
Honestly, Windows 10 is really, really good at catching all of these things these days on its own. Usually when it misses something it is some random motherboard bit. Kinda bummed because it has made me less necessary to hire (I am employed but 20 years ago you needed me more than you need me now).
Also, most component producers make all the stuff you need easily available on their websites. Early 2000’s not everyone did this. Now, positively everyone does this.
Of course, there is always an outlier that screws things up but 98% of the time your grandma could do this. I get paid for the other 2%.