PythonHub
2.51K subscribers
2.35K photos
50.1K links
News & links about Python programming.
https://pythonhub.dev/
Download Telegram
In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

In-Kernel Broadcast Optimization (IKBO) eliminates redundant user-embedding replication by fusing broadcast logic directly into interaction kernels, significantly reducing memory bandwidth and compute waste. This co-design approach delivers up to a two-thirds reduction in latency across Meta's recommendation stack, optimized for high-performance hardware like NVIDIA H100 and Meta’s MTIA.

https://pytorch.org/blog/in-kernel-broadcast-optimization-co-designing-kernels-for-recsys-inference/