What exactly do Quadro cards do?

As far as I understood, Quadro cards are for rendering exclusively, however if we take a look at this Blender 3d rendering test (ray tracing) http://blenchmark.com/gpu-benchmarks , we see that a Quadro P6000 (which costs several thousand dollars) performs worse than a basic Gtx 1080 or a SLI of just or 3 previous generation 680/780’s, which cost 10 or even more times less.

The most likely reason for this is probably that Blender as a program isn’t optimized for Quadros, but if Blender, a mainstream 3d modelling software with Raytracing rendering isn’t optimized for them, then what actually is? How is it even possible that an insane 24gb gddr5x card is outperformed by a 1080? This is what is marketed on Quadro’s page:

  • Build extremely large scenes with large numbers of 3D elements, all fitting in graphics memory.
  • 3D texture painting artists can enjoy creative flexibility in their workflow without being constrained by a maximum number of textures.
  • Visual effects artists can create the most complex special effects, maintaining all assets in graphics memory and streamlining the effects workflow.

How is it even possible that Blender isn’t using any of this? Solidworks is one program that is marketed for a Quadro, but that isn’t a program that is used much by 3d texture painting artists.

Other than that, games somehow also don’t manage to use none of this, despite games being realtime rendering engines with textures, lighting, shadows and so on.

My impression is that Quadro is built for high capacity but not speed. That it’s for people who want it to do a whole lot and who don’t care if they have to wait a second.

offer drawing accuracy and features more important to workstation users, and more importantly offering drivers certified for the application you’re running. e.g. if you’re having a problem with what you’re seeing on your CATIA screen, Dassault will likely only be able to help you if you’re running a driver version they’ve certified.

gaming cards will always be faster than workstation cards because they prioritize clockspeed and bandwidth to be faster than “the next guy.”

One of the large differences is in support. If your application is crashing because of the Nvidia driver, they’ll actually dedicate engineering resources to fixing your problem.

If your game is crashing on your 1080 Ti, Nvidia doesn’t really care.

There are a few differences.

The Quadro cards are made for industrial uses (3D movie makers, oil exploration and so on). As such they have a heavier workload than your standard gaming card. The GPUs on a Quadro are cherry picked by Nvidia for the best reliability and lowest heat output/lowest power use.

The frame buffer is larger and has error correction technology and as mentioned above they have special drivers made specifically to work with their software.

The other big difference, and important for the tasks these are meant for, is double precision floating point calculations (GeForce cards cannot do that…except maybe the Titan).

There are six major advantages to the cards themselves:

[ul]
[li]Full support for 10 bit colour, but now the gaming GPUs support that for HDR so it’s only relevant for older cards. [/li][li]Much better FP32 (compute) performance than the gaming GPUs. [/li][li]Quadro Sync 1 & 2 support - useful when running many displays.[/li][li]High-end cards often have extra VRAM than their consumer counterparts.[/li][li]High-end cards often have ECC memory[/li][li]SLI support. SLI support seems to be disappearing from gaming GPUs because developers aren’t coding for it.[/li][/ul]

Because I have one of these, and just tested let me add some information here.

In the Pascal and later generations, the half and double precision floating point performance of the GeoForce line is artificially limited in order to maintain market segmentation. The chips are are limited to 1:32 of their native performance on float64 and 1:64 on float16 compared to float32 perf…

I actually just had to resort to buying a Titan V last week due to this issue, and for pure math the float32 performance is within my error bars for unity. but you will see that the float64 and float16 GFLOPS are quite different.

titan_v[‘float16’] / 1080ti[‘float16’] = 111.70 (tensor core increase)
titan_v[‘float32’] / 1080ti[‘float32’] = 0.98
titan_v[‘float64’] / 1080ti[‘float64’] = 14.42 (no artificial penalty)

Note: these are artificial benchmark results, do not take the numbers as absolute.
As for the Titan V, the main restriction will probably be on interconnections between cards being limited to the PCIe bus via p2p transfers.

There are also licencing considerations for Datacenter use, but there are features that are limited, either due to yield issues or intentional changes through fused options.

As there aren’t many benchmarks out there yet that do not relate to ML, which doesn’t care about floating point accuracy much or gaming let me set a rough ratio.

i9-7200x ~ 0.634 TFLOPS float64
1080 ti ~ 0.345 TFLOPS float64
Titan V ~ 6.2 TFLOPS float64

Note that it does depend on use case, and as another comparative mark which may not be captured with the above. On pytorch’s examples repo, using resnet18 on the full imagenet ILSVRC2012 data set the per batch time dropped by about 50%, but two 1080 ti boards on the same PCIe switch are faster, and were faster than the Titan V + 1 1080 ti, because different generations cannot perform p2p transfers.

So unfortunately it really just depends on what the use case.

On a side note, For my use case and expected term it was still significantly cheaper to buy than using AWS resources with an ROI of about 6 weeks of use. But I am working on a mixed precision need and I send float64 math to the GPU and save the CPU for arbitrary precision decimal needs.

I should also point out that SLI is useless for compute needs, it only really has enough bandwidth for synchronization needs, and it’s use is orthogonal to the compute use case.

Peer to Peer data transfers happen over the PCIe bus, and those benchmarks on that page will not be attainable with multiple cards without ensuring that you pay attention to the PLX configuration.

You also don’t know how much people have overclocked those GPUs, and in many cases those configurations will not support a typical US power outlet.

As an example with 3 * 1080 Ti GPUs and a i9-7200x CPU I had to limit power to avoid hitting the current limiter on a 1200W power supply. There is also an entire community who tries to get on the leader boards of these benchmarks and will actually resort to using methods like using liquid nitrogen cooling in order to get up high on the list.

While I am not saying this is the entire reason for the results it is important to realize that this is a hobby for some people.

https://www.youtube.com/watch?v=yv2Pz5mYW3g

I still don’t see any real world examples, just various features and driver updates and prioritizing, someone did mention 3d movie makers, but 3d movie making could be rendering in Blender, the example that I wrote about in the first post and non-quadro cards were better at it.

So not speaking about random features, teraflops, the fact that Nvidia cares more about Quadro customers,etc, what are some real world programs that are related to more artistic 3d modeling or video editing, where Quadros actually make a difference?

I depends on the application, and the accuracy involved. View-port wire-frame rendering and double-sided polygon rendering, which you use in CAD/3D but not in games will be faster.

If you need more memory than provided by the Consumer models you will need to move to the Quadro cards too. E.G. the P6000 has 24 GB of ram.

A lot of it is market segmentation, included features and enabled features. But don’t base your beliefs off of simple benchmarks like you linked, note that the fastest single card on that list is a 780 Ti, which was probably binned for liquid nitrogen cooling, as there is no way a stock card is faster than a 1080ti.

Sometimes you need lots of monitors stitched together, which requires Quadro, sometimes you need more that 256 shades of a color to prevent banding which requires 10 bit color and quadro, sometimes you need to edit 8K video which needs more memory.

The example you provided was at low resolution, low color depth and didn’t use much ram. But it is probably a good rule that if you don’t know why someone would need a feature that is only sold in the Quadro line that you probably don’t need one.

You will know, or at least your wallet will when you run into those limitations.