HPC Guru (Twitter)
RT @ProjectPhysX: @ProfMatsuoka At this point, arithmetic performance is so far ahead of memory bandwidth that it makes sense - even for non-AI workloads - to use "spare" arithmetic cycles to do data compression, such that the algorithm uses less memory and bandwidth.
https://t.co/3i9YdeXnOr
RT @ProjectPhysX: @ProfMatsuoka At this point, arithmetic performance is so far ahead of memory bandwidth that it makes sense - even for non-AI workloads - to use "spare" arithmetic cycles to do data compression, such that the algorithm uses less memory and bandwidth.
https://t.co/3i9YdeXnOr
Twitter
Moritz Lehmann
Make the lattice Boltzmann method (#LBM) use only ½ the memory and run 80% faster on #GPUs? #FP32/16-bit mixed precision makes it possible. Implementation takes 20 lines in existing code. Accuracy often is as good as #FP64.🖖🧐📃 Read my preprint: arxiv.or…
HPC Guru (Twitter)
RT @HatemLtaief: @HPC_Guru @Arm @ECMWF @s_e_hatfield But #FP32 accumulation with #FP16 operands is usable, as long as u cherry pick where to apply it. #climate #nvidia
https://ieeexplore.ieee.org/document/9442267/
This is indeed a missing kernel on #fugaku from #Fujitsu scientific library. #ARM performance library does not have it either.
RT @HatemLtaief: @HPC_Guru @Arm @ECMWF @s_e_hatfield But #FP32 accumulation with #FP16 operands is usable, as long as u cherry pick where to apply it. #climate #nvidia
https://ieeexplore.ieee.org/document/9442267/
This is indeed a missing kernel on #fugaku from #Fujitsu scientific library. #ARM performance library does not have it either.
ieeexplore.ieee.org
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC
Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Spatial data are assumed…
HPC Guru (Twitter)
RT @ProjectPhysX: How much floating-point precision do you need for lattice Boltzmann #CFD? #FP64 is overkill in most cases. #FP32/#FP16 mixed-precision works with almost equal accuracy at ¼ the memory demand and is 4x-10x faster on #GPU.
🧵1/7
Big @SFB1357 #PhD paper👉 https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats https://twitter.com/ProjectPhysX/status/1552225695044190212/photo/1
RT @ProjectPhysX: How much floating-point precision do you need for lattice Boltzmann #CFD? #FP64 is overkill in most cases. #FP32/#FP16 mixed-precision works with almost equal accuracy at ¼ the memory demand and is 4x-10x faster on #GPU.
🧵1/7
Big @SFB1357 #PhD paper👉 https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats https://twitter.com/ProjectPhysX/status/1552225695044190212/photo/1