BTW jsem se poradne podival na ty Tensor Cores:
These cores are essentially a mass collection of ALUs for performing 4x4 Matrix operations; specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding that result to an FP16 or FP32 4x4 matrix to generate a final 4x4 FP32 matrix.
... tudiz nejenze to umi jenom matice, ono to umi nasobit jenom FP16 matice. Tudiz vsichni co delaji neco jineho nez deep learning, na tech 120TFlops muzou rovnou zapomenout (protoze 99% HPC ktere neni deep learning, pouziva FP32 nebo FP64). Tohle mi uz prijde jako dost extremni sazka na AI...