CPU type performance

What is the relative performance of Intel’s Xeon and Core 7 processors? Are these their highest performance CPUs?

You’d have to be more specific. There are literally dozens of processors with the label Xeon or Core i7.

ETA: I haven’t been keeping up with Intel’s latest lately, but if you are looking for the top end of the moment, it is probably this.

Depends what you want to do. The two products are meant to meet different needs.

On balance they actually perform very similarly on something like a PC game assuming the specs are roughly the same.

The Xeon costs a lot more because it is aimed at commercial applications. They tend to be clocked more slowly but have more cores, support ECC memory (which is not important to most consumers), are made for 24/7 operations and some other things.

In terms of multithreaded overall performance, the Xeon is vastly faster.

The 18-core E7-8890 v3 does nearly 3,000 Linpack GFLOPS (ie, billion Floating Point Operations Per Second). I just ran Intel’s optimized Linpack benchmark on my 2015 iMac 27 which has a 4Ghz i7-6700K, and it produced 200 GFLOPS. So in this one test the the Xeon is 15 times faster.

However the E7-8890 v3 base clock is only 2.5Ghz, so my i7 would be much faster for a single-thread task. Likewise Xeons above 4 cores don’t have Quick Sync video transcoding hardware, which can improve encode/decode performance by a factor of 4 or 5: http://www.intel.com/content/www/us/en/architecture-and-technology/quick-sync-video/quick-sync-video-general.html

It is easily possible a fast laptop with an i7 could beat an 18-core E7-8890 v3 on H264 video transcoding. H264, H265, VP9 or similar “long GOP” video cannot be meaningfully accelerated using GPU methods since the core algorithm is sequential and not amenable go GPU-style parallelization.

For four threads and less, the fastest stock Intel CPU is the 4-core i7-7700K which has a 4.2Ghz base clock and turbo speed to 4.5Ghz.

Increasingly software is heavily multi-threaded and can harness a lot of cores. However this varies by the application. There is a limit called Amdahl’s Law which says a tiny fraction of non-parallel code (say the threads have to momentarily sync on some object) will quickly limit maximum multi-core speedup: Amdahl's law - Wikipedia

But there are common tasks in Lightroom, Photoshop, Premiere Pro and Final Cut Pro X which could effectively use 16 or more cores.

From a historical standpoint, the original Cray-1 supercomputer did about 100 megaFLOPS, so the Xeon E7-8890 v3 is about 30,000 times faster, and that’s without using a GPU.

However today’s fastest supercomputers are still proportionately faster than the fastest single desktop or server CPU. The current fastest supercomputer is the Chinese Sunway TaihuLight at 93 petaFLOPS, which is 31,000 faster than the Xeon Xeon E7-8890 v3. By next year the U.S. is supposed to have a 200 petaFLOP supercomputer, the IBM Summit: http://www.computerworld.com/article/3086178/high-performance-computing/u-s-to-have-200-petaflop-supercomputer-by-early-2018.html

All recent supercomputers are massively parallel designs, using vast numbers of CPUs or GPUs. By contrast the original Cray and most supercomputers until the 1990s were vector machines which had a few powerful CPUs that could operate simultaneously on strings of numbers called vectors.

Intel’s modern CPUs have vector instructions similar to the Cray and if used they can operate on strings of numbers simultaneously, which is SIMD (Single Instruction Multiple Data). For tasks that are massively parallel with few dependencies, today we have GPUs that use hundreds or thousands of lightweight threads. However some algorithms cannot be meaningfully accelerated this way.

Improving performance on CPUs is now extremely difficult since the clock rate cannot be increased much due to the breakdown of Dennard Scaling: Dennard scaling - Wikipedia

Further improving the instructions per cycle (IPC) is also difficult since nearly all architectural tricks have been exploited. The complexity of decoding and executing more than about 6-8 instructions from a stream in parallel rises exponentially due to various factors.

This leaves only adding additional cores, which rapidly becomes heat limited no matter how much cooling is added. Previously, process fabrication shrinks would allow reducing feature size and thus heat while adding more transistors, but this also is approaching limits of current technology. A key paper on this was Dark Silicon and the End of Multicore Scaling (Esmaeilzadeh, et al, 2011): ftp://ftp.cs.utexas.edu/pub/dburger/papers/ISCA11.pdf