![]() ![]() My entirely uneducated guess is that only around 20% of real-world operations can be "fused" to benefit from this capability. ![]() Navi3 can indeed execute two FLOPs per unit per clock (in a kind of a SIMD within SIMD fashion), but this is subject to a lot of restrictions and won't work every time. They clam that Navi3 has doubled the execution units compared to Navi2, but this is a bit misleading. *The RX 7900 XTX is obviously an outlier, but this is because how AMD advertises its compute capability. No easy way to check this, unfortunately. Cinebench showing a different distribution of results can be either to less optimal optimization on some platforms or due to some difference in the scene data. I would assume that the CUDA backend is well maintained and well optimised, and all of this kind of puts the results where they are expected. ![]() There is some variation of course, like AMD is a bit faster per FLOP than Nvidia/CUDA, probably thanks to their RT acceleration (which primitive as it is does help does a little bit), and Apple being the fastest of the bunch per FLOP (probably because Apple optimised the hell out of it by now), but there is no huge disparity. If we disregard Nvidia OPTIX for the moment (since it catapults Nvidia into an entirely new category), we get following scores:Īs we can see, the fit between FLOP throughput and the score is fairly good between the manufacturers. But on the other hand Blender scores seem to be predicted very well by the compute capability of the card. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |