PythonHub

In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

In-Kernel Broadcast Optimization (IKBO) eliminates redundant user-embedding replication by fusing broadcast logic directly into interaction kernels, significantly reducing memory bandwidth and compute waste. This co-design approach delivers up to a two-thirds reduction in latency across Meta's recommendation stack, optimized for high-performance hardware like NVIDIA H100 and Meta’s MTIA.

https://pytorch.org/blog/in-kernel-broadcast-optimization-co-designing-kernels-for-recsys-inference/

62 views07:15

About

Blog

Apps

Platform