How to optimize GPU overclocking

I’m trying to get the most out of my GPU. It’s a Zotac GTX 970 AMP Omega: ZOTAC GeForce GTX 970 4GB AMP! Omega Core Edition, ZT-90106-10P - Newegg.ca

So far, I’ve used the EVGA PrecisionX 16 overclocking tool and stress tested it using FurMark.

I’ve upped the voltage 87mV to about 1262mV. I’ve OCed the core +208MHz and it hovers in-between 1574 and 1600 MHz when running FurMark. I haven’t OCed the GPU memory. At those settings, the temperature stabilized in the low 60s Celsius with the fan running at about 75-80% capacity.

The screen froze when my OC was at 250 MHz so I dialed it back to get an even 1600 MHz.

  1. Can you see any spot where I’m doing something significantly dangerous or suboptimal?

  2. How do I know when to OC the memory?

  3. When running the FurMark stress test, its framerate is 4 times higher when I minimize the FurMark window than when I display it. The core clock also stays below 1100 MHz when it’s displayed but goes up to 1600 MHz when minimized. What’s that about? Should I run the stress test with the FurMark window minimized or not?

You need to use a test that renders the same image repeatedly and looks for changes in the output pixels. This will immediately pinpoint errors caused by overclocking.

OCCT GPU error test is what I happened to use, but there are others.

If you use it, you will immediately see errors with even modest overclocks. I found I couldn’t overclock more than an insignificant level before the GPU begins to make mistakes. I understand it, CUDA using apps can crash from GPU errors, the graphics drivers can crash and freeze, and games can crash. They are bad. Sure, you may not see the erroneous pixels but GPU driver control logic requires correct responses from the GPU or it can crash. (the reason it doesn’t crash right away is it takes just the right error)

Anyways, TLDR :

  1. Use an error checking program
  2. Don’t overclock past the point that you get errors
  3. You probably will find the results disappointing. Save your pennies and get a bigger GPU. I hear Nvidia will have something nice sometime next year. Sell that 970 for a couple hundred and upgrade.

Frankly, I don’t see a need. You can run all but about 10 titles for the PC smooth as glass, 60 FPS with all the bells and whistles, with that 970, at 1920x1080. Among the few titles that run slow, most of them the reason is the title itself is poorly programmed and isn’t making effective use of your GPU, and more GPU power will only help a little. You’d be better served by either not playing those games or adjusting settings until you have a decent framerate. Also, you should get a G-sync monitor, because g-sync allows for a playable experience on reduced framerates with the missing frames being far less noticable. (I ain’t saying it will make 10 fps feel good but if it dips to 48 from 60 you will not notice)

If I don’t increase the voltage, that is. I started this thread in part because googling overvolting has not given me much solid info about overvolting. If all it does it cut the expected lifespan of my GPU from 10 years to, say, 3, that’s fine by me. If all it means is that it gets hotter, that’s fine by me too as I have at least 20c of headroom in my overclock.

Thanks for the software name, I will use it.

Even after you increased the voltage?

Aside from games, what apps are you thinking about?

I want to be able to use VR which means about 2160x1200 at 90fps.

Overvolting will cut the life of your GPU, and in a non-linear fashion, though 87 mV is a fairly modest difference and I suspect the difference in lifespan is <2x.

The temperature is a major factor and the fact that you’re only at 60 C is encouraging. GPUs are typically expected to survive for >5 years at 90+ C, so I think you won’t have any problems at all.

Habeed is right that even minor errors can cause crashes. It’s not just output pixels that are the problem–pixel shaders today are very complex, and are essentially general purpose programs. Say for instance that one of them loops across 10 light sources in a scene. It might use a 32-bit integer to store the light count. But if an error causes the high bit in that number to flip, it might think the light count is more like 2 billion. The program iterates for a long time and the hang detection logic may kick in.

Run Furmark without it minimized. In all likelihood, it’s not actually rendering anything when minimized, and is just CPU limited. I’m not 100% sure what’s going on with the core clock, but Furmark is well-known to be a “power virus” (not a good term, but commonly used) and if the core clock is being throttled, it’s likely because you’re hitting thermal or power limits.

Are there better tools than FurMark?

There are, but I’d get in trouble if I gave them to you :).

I haven’t tried the OCCT GPU test but it sounds like FurMark with error checking. That’s what you want. Unless your goal is to just post the highest 3DMark score, I would not overclock past the point where any errors occur. A single bit error may not be noticeable most of the time, but sometimes it will cause a hang. So make sure your test program can go for a few hours without an error.

That ain’t happening on a 970. You’re going to need dual GPUs and they will need to be the biggest, baddest GPU from the generation that isn’t out yet.

The problem is that VR requires extremely low latency rendering. There is a minimum per frame latency with any GPU design that is not enough to do 180 fps if you are clearing buffers in between frames. So you need 2 GPUs. Existing demos they had to rip out and turn off almost all the features to get it to work at all.

And 2160x1200 is a very steep rendering requirement. That’s more than the consumer version of the rift. Not practical for a full featured game. To even do it for 1 display for a recent game, you’ll need a really badass GPU. Not a 970.

Them’s the breaks. VR isn’t going to be for poor gamers for a long time. Most of my information comes from here.

That’s not true at all. A single GPU is always better than two of them at half speed.

In fact, the usual way that dual GPUs work doesn’t improve latency at all vs. single. They alternate frames, and each frame takes as long as it would have without a partner. You get twice the FPS but the latency is unaffected.

VR is a bit special in that the GPUs can work simultaneously on their individual eye. Dual GPUs therefore work a lot better, and latency scales down with framerate the way you’d expect. But it’s still no better than a single GPU at twice the speed.

You are right that 2160x1200@90 Hz requires a badass GPU, and sustaining that on modern games probably will require dual GPUs. But it’s nothing to do with inherent advantages of dual GPUs (there really aren’t any) and instead just based on raw perf.

For a long time, three frames of latency was considered acceptable, but that’s way too much on VR. The NV driver already caps the queued frames to 1 when VR goggles are attached; I dunno about AMD.

Nope. There are global buffers on GPUs for a specific frame, shared by everybody. You have to clear those buffers and start from a particular point to render the other frame.

This creates a latency. Amdahl’s law is a bitch. And it’s so large that there is no way, period, to render a modern game with 1 GPU. Only way is to either have 2 GPUs or redesign GPU architecture for VR so that essentially 2 parallel GPUS are in the same chip. But the reticle limit makes such a GPU architecture less efficient in the ruthlessly competitive market for more FPS in the most common type of games, which do not use VR or 3d.

I’m not sure what this has to do with anything, or what you even mean by it.

Even if the single GPU treats each eye as a totally independent frame, it’s still never worse to have a single GPU with twice the horsepower. Say that your slow GPU renders an eye in 10 ms. Two of them run in parallel can render both eyes in 10 ms. The single fast GPU can render an eye in 5 ms. It still takes 10 ms to render both eyes, even if they’re totally serialized.

But in fact they’re not serialized; you can accrue benefits by rendering both eyes simultaneously. Depending on what you’re limited by, this may be a big advantage. For instance, suppose you spend a lot of time rendering shadow maps and doing GPU physics. These things cannot be made parallel easily: you just have to do them twice on a dual GPU setup. The single GPU only has to do them once and enjoys an advantage because of it.

The top GPUs are reticle limited and so if you need more perf than they offer, you have no choice but to go with dual GPUs. But efficiency wise, you are almost always worse off.

No. Suppose you have a GPU that is “twice as fast”. This actually means it runs at the same clockspeed and has twice the number of execution units (or the amount of data handled by the execution units it does have is greater and/or each one gets more done in the same unit time).

It takes a minimum amount of time to load all the data into memory, and then have the execution units perform each one of the steps in series to render a frame. A GPU that is ‘twice as fast’ will only produce double the framerate if the GPU you are comparing to is heavily loaded, and the actual absolute numbers for frame-rates for the GPU you are comparing to are low. If the GPU you are comparing to can already do 100 fps, a double-speed gpu will not do 200 because of this overhead.

Anyways, keep this thread in mind. I’m not a GPU engineer, I just can clearly visualize how the state machines work for one, and how the propagation and network delays mean you cannot just double the speed with 2000 state machines if you had 1000 before. You can double *throughput *- render a frame with twice the stuff in it 60 times a second vs the machine with 1000, but speed will not double. (it does increase because actual GPU rendering involves reusing the same state machine array for over and over for different regions of memory in the GPU)

This is totally separate and orthogonal problem to the idea of rendering the *same *frame sequence with 2 GPUs, or elements of the same frame.

I’m 100% certain I’m correct. I’ve worked with various UE4 VR demo projects and seen the catastrophic framerate losses firsthand. I’m also aware that the basic idea here is a fundamental theory of computation. Please don’t argue with me unless you have a proof that shows fundamental mathematics is wrong. You can be skeptical if you want, but real world data will be here very soon to show I’m correct. (basically, once the VR headset race kicks off in earnest this Christmas a lot more people will know about these problems)

That’s a really small part of the total time. In fact, in any given frame it’s liable to be close to zero. If you’re wandering around an open world, you’ll have to periodically load new textures, models, etc., but these will manifest as small hitches and not a measurable difference in average frame rate.

There are two basic conditions you can be in: CPU limited and GPU limited. If you’re GPU limited, then as you say you will go twice as fast.

If you’re CPU limited, then you are even worse off with multiple GPUs. There is inherent overhead associated with managing more than one GPU.

Under virtually every condition where you get perfect 2x scaling with dual GPUs, you will also get perfect scaling with a double-speed GPU.

Sure, I can think of a few exceptions. But they are very rare. Far more common is that the double-speed GPU gets perfect 2x scaling, while the dual GPUs get 1.6x scaling.

The properties of VR mean that the dual GPUs will hit that 2x a little more often than with non-VR situations. But they’ll still only just approach the performance of the single GPU case.

GPUs have very long and complex pipelines, but they’re still short compared to the overall frame time. The GPU drains in perhaps a few tens of microseconds, and in fact in a typical frame already will drain several times. It’s not a problem until it starts happening hundreds of times a frame.

So while it’s true that larger GPUs can take longer to drain (though not linearly), in the real world it has almost no relevance. Besides, there are plenty of situations where dual GPU support causes additional pipeline drains.

Again, you have it almost exactly backwards. It’s the multi-GPU case where you can increase throughput but with little effect on latency. A single uber-fast GPU gets you performance and lower latency.

Not to pull rank or anything, but I’ve worked at a well-known GPU manufacturer for over 15 years, and have been on the DirectX performance team for a substantial part of that. I’ve seen way more real-world data than you have.

It’s possible that we’re arguing past each other since I can’t quite understand your claim. And it is possible that the behavior of competitor chips is different, though it would be a result of poor design on their part–forgetting to scale certain parts as the rest of it got faster (like triangle setup, for instance). But for us, it is easily true 99% of the time that a double-speed GPU is better than two ordinary ones.

I will point out that the economic scaling factors do work out differently, and frequently tilt the balance in favor of multi-GPU. Because the very fastest GPU is a premium product, its !/$ ratio is lower than if you go a step or two down. It may be cheaper to buy two upper-mid GPUs than a single high-end one. However, this is partially artificial and doesn’t contradict my other claims about single vs. dual- GPU perf.

My claim is simple. Render the same cube in UE4. Single GPU, no VR mode - 200 FPS. VR mode - you might get 40 FPS.

If the workload is ‘merely’ double why is the performance loss 5x?

Amdahl’s law. Doubt your “real world data” tries to do this.

Ditto, go to page 16 of this presentation. Oh look, catastrophic latency issues. You can barely do it at all. 2 GPUs or go home. Basically, I dunno what you’ve been doing for 15 years, but the actual hardware gets hugely less efficient when you can’t pipeline because you need hard realtime response due to head movement.

It sounds like they’re doing it wrong, but that could be a reasonable slowdown.

It’s not double the workload. VR requires rendering to some offscreen surfaces, and then using a compositing shader to perform the inverse lens correction–reversing the geometric distortion and the chromatic aberration.

The offscreen surfaces might be substantially higher resolution than the final output. You technically need this since the distortion shader magnifies part of the image, and to get a rough 1:1 pixel mapping in the end, you need a higher input resolution.

In fact, the Valve PDF you linked to above says exactly this. Even though the HTC Vive is 1080x1200 per eye, they render at 1512x1680. That’s almost 4x the workload as compared to a single eye. The distortion shader isn’t free, so 5x is pretty believable.

There are ways of improving this factor, but they have nothing to do with multi-GPU setups.

I’m not sure what you mean. We test using a combination of actual game benchmarks, with stereo rendering and without, as well as more synthetic tests.

If anything, GPU work is subject to a kind of reverse Amdahl’s law. Although everything is parallelizable (if it wasn’t, it wouldn’t be on the GPU), not all of the workload scales with the screen size. Double the rendered pixels doesn’t necessarily require double the geometry or double the shadow map size. 2x the pixels might only be 1.5x the work.

Multi-GPU typically has a hard time taking advantage of this, though, because the fixed work has to be either done twice or transferred across GPUs. Single GPUs only have one memory pool and don’t have this problem.

You aren’t understanding the presentation. Only the green bar represents work done by the GPU. The rest of the time is spent in the app, the driver, or waiting for the image to be sent to the panel.

Traditionally, there is a lot of queuing and waiting around when passing the data from one stage to the next. This doesn’t affect frame rate, but it does affect latency. VR setups have to be very careful to avoid any queuing, and to arrange things such that one stage is finished just as the next one is ready to begin. In particular, you have to know when the VSync is going to happen, because that is on a fixed interval (every 1/90 sec).

In traditional dual-GPU setups, that green bar doesn’t get shorter. It’s just that it overlaps with the second GPU, and you get twice the throughput but no lower latency. With VR, though, each GPU can easily work on its own eye, and in principle it’s just as good as a single double-speed GPU. It’s not better, though.

Ok, so you got me on a technicality. You are correct, of course - I acknowledge you real world expertise. So from the perspective of the application, you output what you want to render and forgetaboutit. Well, there’s some feedback, but in any case, there’s a big delay between when you send the data out and you get a frame complete signal of some sort from the GPU drivers.

You’re saying that delay isn’t the state machines in the GPU “draining”, it’s data transmission over busses, CPU (for the gfx drivers and directX/OGL stack), memory bandwidth, etc. You must be right I guess.

In any case, I thought, from this presentation and my knowledge that latency is harder to fix than increasing parallel execution units and from my own experiments, that the problem was the GPU just can’t chug through a frame any quicker if you double the execution units.

You’re saying there’s actually 2 fixes - the best one is 2 GPUs. Since both view perspectives come from an identical set of source data, it sounds like you could write an optimized driver path that causes both GPUs to receive the same mirrored data off the bus, cutting your transmission time to the time taken by 1. You could divide the process for the steps that do different things between 2 processing cores on the host CPU. Basically, a team in the driver division of a major GPU manufacturer could make sure this rocks.

You could also make a single GPU that processes both perspectives in parallel.

And you’ve got that reticle limit. You can barely do higher resolution in a modern game with a single GPU because you can’t fit any more transistors. So I think because of this, the economical solution that will be dominant will be 2 physical chips for VR rigs.

If VR turns out to be as awesome as predicted and gamers demand it on a large enough scale, the next console generation will need some type of “splittable GPU” solution where you can address a large, monolithic GPU and get it to render 2 perspectives in parallel or render a scene with more detail for a single output perspective, right?

  1. Guys, wanna either take it outside or get a room, whichever is more to your liking?

Not that I don’t think the intricacies of VR rendering are worth discussing and I do appreciate my thread getting bumped but if anybody else is reading this and wants to reply to the OP rather than the spin-off, please do.

  1. As for the 970 not being enough for 2160 x 1200 x 90fps, 2160 x 1200 is 25% more pixels than 1080p and the 970 does quite well at 1080p : http://images.anandtech.com/graphs/graph8568/67894.png
    http://images.anandtech.com/graphs/graph8568/67897.png

Also, the 970 has the GeForce VR feature which allows for more efficient rendering: http://www.geforce.com/hardware/technology/vr/technology
I will readily admit that I may have to get a second 970 to get a 90 fps minimum and that VR games in the first few years will not be as visually detailed as AAA games.
3) I used OCCT. It ran without problems until I stopped it and it tried to show me the results screen whereupon the PC froze.
4) What elements might hint that increasing memory clock will be more beneficial than increasing core clock?

Dr Strangelove,

When I asked for better utilities than FurMark, I meant the ones you could suggest rather than what you use professionally.

I did some tinkering with MSI Afterburner and testing with Kombustor. The 970 could run a stress test at +87mV and 1566MHz while flatlining at 49C.

I suppose that means there is a good amount of room for further overvolting but no overclocking software has allowed me to get past +88mV. How can one get past that?

Michael, did you enable error counting in OCCT? Did you downclock back to stock to get a baseline and test at stock, first?

It would have been nice but when OCCT finished, the PC froze. I’ll try again after I’m done watching 2 videos in case it freezes again.

If there are errors, that can be corrected by increasing voltage.