https://5dok.net/document/myj79vmy-flexible-matrix-multiplication-kernels-on-gpus.html