Mainstream processors can effectively use AVX-512 .. in about 5 yearsAnd another:
The entire thing was born out of the larabee project, when that project was about rendering. What Intel found was that no matter what they did they could not feed that much data to the CPU without changing the cache architecture, and that such changes to the cache architecture would negatively effect regular performance with crushing memory latency.
So we end up in a situation where Intel knew that they would not be able to process entire AVX-512 registers in one go on all threads, so did not include the execution units necessary to do it even on a single core, let alone have the bandwidth to do it on all of them.
So as Linus rightly notes, the shit is more or less useless right now, and costs a lot of execution time because AVX-512 registers are enormous and like all registers need saving between context switches, saving that is slow because of that lack of bandwidth. A single AVX-512 register is as large as all the general purpose registers combined.
The drawbacks are less-clear, but very apparent: graphics cards are rated to 300 watts. You're now trying to stuff a portion of that processing power into the CPU, and back in the early 2010's, benchmarking showed this to cause the CPUs to run VERY hot. Much hotter, much more quickly than the heat sink could cool them. (I worked at a computer manufacturer -- running Prime95 with AVX instruction set would regularly cause problems.) Apparently, from other comments, the CPU also doesn't have the memory bandwidth to fetch the data quickly enough. Remember, graphics cards use High Bandwidth Memory now to supply up to 1500 shader cores. Really, with AVX, the memory bus can't keep up -- unless you're doing thousands of iterations over the same, cached data, you can do one instruction and then you have to wait.Just buy a GPU, dork