PDP-11🚀
405 subscribers
112 photos
28 files
146 links
AI Hardware & Domain Specific Computing

#FPGA #ASIC #HPC #DNN

@vconst89
Download Telegram
NVIDIA Ampere Architecture In-Depth
#notml
А bit of space news, but not about Crew Dragon as you may expect :)
It was 14 years ago, when Xilinx released it's previous radiation tolerance (RT) FPGAs, acceptable for space application - Virtex 5 series. And finally, new successor is coming - Artix 7 Ultra Scale RT.
https://www.xilinx.com/support/documentation/white_papers/wp523-xqrku060.pdf
Stratix-10-NX-Tehnology-Brief.pdf
763.2 KB
AI-Optimized FPGA for High-Bandwidth, Low-Latency AI Acceleration
The Intel® Stratix® 10 NX FPGA delivers a unique combination of capabilities needed to implement customized hardware with integrated high-performance artificial intelligence (AI). These capabilities include:

High-Performance AI Tensor Blocks
- Up to 15X more INT8 throughput than Intel Stratix 10 FPGA digital signal processing (DSP) block for AI workloads
- Hardware programmable for AI with customized workloads

Abundant Near-Compute Memory
- Embedded memory hierarchy for model persistence
- Integrated high- bandwidth memory (HBM)

High-Bandwidth Networking
- Up to 57.8 G PAM4 transceivers and hard Ethernet blocks for high efficiency
- Flexible and customizable interconnect to scale across multiple nodes
https://www.economist.com/technology-quarterly/2020/06/11/the-cost-of-training-machines-is-becoming-a-problem

The growing demand for computing power has fuelled a boom in chip design and specialised devices that can perform the calculations used in AI efficiently. The first wave of specialist chips were graphics processing units (GPUs), designed in the 1990s to boost video-game graphics. As luck would have it, GPUs are also fairly well-suited to the sort of mathematics found in AI.

Further specialisation is possible, and companies are piling in to provide it. In December, Intel, a giant chipmaker,
bought Habana Labs, an Israeli firm, for $2bn. Graphcore, a British firm founded in 2016, was valued at $2bn in 2019. Incumbents such as Nvidia, the biggest GPU-maker, have reworked their designs to accommodate AI. Google has designed its own “tensor-processing unit” (TPU) chips in-house. Baidu, a Chinese tech giant, has done the same with its own “Kunlun” chips. Alfonso Marone at KPMG reckons the market for specialised AI chips is already worth around $10bn, and could reach $80bn by 2025.

“Computer architectures need to follow the structure of the data they’re processing,” says Nigel Toon, one of Graphcore’s co-founders. The most basic feature of AI workloads is that they are “embarrassingly parallel”, which means they can be cut into thousands of chunks which can all be worked on at the same time. Graphcore’s chips, for instance, have more than 1,200 individual number-crunching “cores”, and can be linked together to provide still more power. Cerebras, a Californian startup, has taken an extreme approach. Chips are usually made in batches, with dozens or hundreds etched onto standard silicon wafers 300mm in diameter. Each of Cerebras’s chips takes up an entire wafer by itself. That lets the firm cram 400,000 cores onto each.

Other optimisations are important, too. Andrew Feldman, one of Cerebras’s founders, points out that AI models spend a lot of their time multiplying numbers by zero. Since those calculations always yield zero, each one is unnecessary, and Cerebras’s chips are designed to avoid performing them. Unlike many tasks, says Mr Toon at Graphcore, ultra-precise calculations are not needed in AI. That means chip designers can save energy by reducing the fidelity of the numbers their creations are juggling. (Exactly how fuzzy the calculations can get remains an open question.)

All that can add up to big gains. Mr Toon reckons that Graphcore’s current chips are anywhere between ten and 50 times more efficient than GPUs. They have already found their way into specialised computers sold by Dell, as well as into Azure, Microsoft’s cloud-computing service. Cerebras has delivered equipment to two big American government laboratories.
Apple has announced the biggest change heading to its Mac computers in 14 years: the dumping of Intel Inside.
The company is ditching Intel’s traditional so-called x86 desktop chips for Apple’s own processors based on ARM designs - those used in smartphones and mobile tablets, including the iPhone and iPad.
The Guardian
PDP-11🚀
The chapter from the upcoming Vivienne Sze book " Efficient Processing of Deep Neural Networks" http://eyeriss.mit.edu/2020_efficient_dnn_excerpt.pdf * Processing Near Memory * Processing in memory * Processing in the Optical Domain * Processing in Sensor
efficient_proceeding_of_dnn.pdf
20.4 MB
The fantastic book is finally generally available now!

Efficient Processing of Deep Neural Networks
This tutorial covers all aspects of model software and hardware design related to the this topic. Explain very key concepts of weight/output/input/row stationarities and dataflow, power budget tradeoffs and hardware-software co-design aspects.
Efficient Processing of Deep Neural Networks, Contents
Sorry guys, this channel is transforming into link-collection feed, but I promise to be back on track soon with brief summaries :)

https://www.electronicdesign.com/industrial-automation/article/21136402/smartnic-architectures-a-shift-to-accelerators-and-why-fpgas-are-poised-to-dominate
Bluspec Haskell is an open-source framework, yet another High Level Hardware Description Language, but now based on Haskell

Jonathan Ross, hardware AI startup Groq founder and ex-Google TPU developer, claims that it was used on initial stages of TPU design. It looks like Groq is also actively using it
https://www.linkedin.com/in/jonathan-ross-12a95156/

Bluespec research note
https://arxiv.org/pdf/1905.03746.pdf

The latest version of bluespec Compiler can be found here
https://github.com/B-Lang-org/bsc

And here's the tutorial
https://github.com/rsnikhil/Bluespec_BSV_Tutorial/tree/master/Reference