Spark in me
2.2K subscribers
822 photos
48 videos
116 files
2.68K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
Если вы сейчас собираете себе или компании железо для нейросеток, то не только статья про GPU Limbo, но и эта статься про ответ Intel на новые линейки от AMD вам будет интересна
- https://3dnews.ru/954174
- http://timdettmers.com/2017/12/21/deep-learning-hardware-limbo/#more-627

Понятно, что процессор это не боттлнек, но все равно интересно как конкуренция влияет на рынок.

#hardware
Poor man's computing cluster

So, when I last checked, Amazon's p3.4xlarge instances cost around US$12 per hour (unless you reserve them for a year). A tower supercomputer from Nvidia costs probably US$40-50k or more (it was announced at around US$69k).

It is not difficult to crunch the numbers and see, that 1 month of renting such a machine would cost at least US$8-10k. Also there will the additional cost / problem of actually storing your large datasets. When I last used Amazon - their cheap storage was sloooooow, and fast storage was prohibitively expensive.


So, why I am saying this?


Let's assume (according to my miner friends' experience) - that consumer Nvidia GPUs can work 2-3 years non-stop given proper cooling and care (test before buying!). Also let's assume that 4xTesla V100 is roughly the same as 7-8 * 1080Ti.

Yeah, I know that you will point out at least one reason why this does not hold, but for practical purposes this is fine (yes, I know that Teslas have some cool features like Nvlink).

Now let me drop the ball - modern professional motherboards often boast 2-3 Ethernet ports. And sometimes you can even get 2x10Gbit/s ports (!!!).

It means, that you actually can connect at least 2 (or maybe you can daisy chain them?) machines into a computing cluster.

Now let's crunch the numbers

According to quotes I collected through the years, you can build a cluster roughly equivalent to Amazon's p3.4xlarge for US$10k (but with storage!) with used GPUs (miners sell them like crazy now). If you buy second market drives, motherboards, CPUs and processors you can lower the cost to US$5k or less.

So, a cluster, that would serve you at least one year (if you test everything properly and take care of it) costing US$10k is roughly equivalent to:
- 20-25% of DGX desktop;
- 1 month of renting on Amazon;

Assuming that all the hardware will just break in a year:
- It is 4-5x cheaper than buying from Nvidia;
- It is 10x cheaper than renting;

If you buy everything used, then it is 10x and 20x cheaper!

I would buy that for a dollar!
Ofc you have to invest your free time.

See my calculations here:
http://bit.ly/spark00001

#deep_learning
#hardware
Logging your hardware, with logs, charts and alers - in style

TLDR - we have been looking for THE software to do this easily, with charts / alerts / easy install.

We found prometheus. Configuring alerts was a bit of a problem, but enjoy:
- https://github.com/snakers4/gpu-box-setup#prometheus

#deep_learning
#hardware
Assembling a NAS for less than US$50

So ... you want a NAS for emergency backups that only you know about.

You have spent money on GPUs, drives, devboxes and you would like to get your NAS for free.
Ofc, if you are a clever boi, you will have RAID arrays on your devbox, offsite backups, etc etc

If you feel particularly S&M, you might even use AWS Glacier or smth similar.
Or you may buy a NAS (decent devices start from US$500-1000 w/o drives! rip-off!)


But you see, all of the above variants cost money.
Or you cannot easily throw such a backup out of the window / encryption creates overhead.

So you can create a NAS on the cheap in style:
- Buy any raspberry pi (US$5 - US$20, you can find one used even cheaper);
- Buy a USB HDD enclosure (US$5 - US$40);
- Find some garbage drives for free;
- Copy your files, put HDD under your pillow;
- Profit;

Added bonuses:
- If you live in a police state - you can use RAID 0 (just hide the second drive) => in essence this is like have a perfect one-time pad encryption;
- Easily use RAID 1 or RAID 10 with 4 drives;
- Very high portability, if you use 2.5'' drives;
- Mdadm arrays are easily transferrable;
- Cyber punk vibe;

#hardware
The current state of "DIY" ML hardware

(i.e. that you can actually assemble and maintain and use in a small team)

Wanted to write a large post, but decided to just a TLDR.
In case you need a super-computer / cluster / devbox with 4 - 16 GPUs.

The bad
- Nvidia DGX and similar - 3-5x overpriced (sic!)
- Cloud providers (Amazon) - 2-3x overpriced

The ugly
- Supermicro GPU server solutions. This server hardware is a bit overpriced, but its biggest problem is old processor sockets
- Custom shop buit machines (with water) - very nice, but (except for water) you just pay US$5 - 10 - 15k for work you can do yourself in one day
- 2 CPU professional level motherboards - very cool, but powerful Intel Xeons are also very overpriced

The good
- Powerful AMD processor with 12-32 cores + top tier motherboard. This will support 4 GPUs on x8 speed and have a 10 Gb/s ethernet port
- Just add more servers with 10 Gb/s connection and probably later connect them into a ring ... cheap / powerful / easy to maintain

More democratization soon?

Probably the following technologies will untie our hands

- Single slot GPUs - Zotac clearly thought about it, maybe it will become mainstream in the professional market
- PCIE 4.0 => enough speed for ML even on cheaper motherboards
- New motherboards for AMD processors => maybe more PCIE slots will become normal
- Intel optane persistent memory => slow and expensive now, maybe RAM / SSD will merge (imagine having 2 TB of cheap RAM on your box)

Good chat in ODS on same topic.

#hardware
Amazing hardware YouTube channel (RU)

Link.

Smart, in-depth, highly analytical, no bs / ads / cringe / sensationalism. Not your typical Russian channel, not Linus Tech Tips or similar.

Example videos:

- What hardware companies could do
- Choosing a PSU

#hardware
New embedded computing platform?

Looks like Intel has a new computing platform for ultra compact PCs. This may fit some of the semi-embedded ML applications!

Intel NUC 9 Extreme Compute Element
- https://www.theverge.com/2020/1/7/21051879/intel-pc-nuc-9-extreme-ghost-canyon-element-hands-on-teardown-ces-2020
- https://pc-01.tech/razer-tomahawk/

#hardware
Speculations about x86 ARM Future

https://pc-01.tech/arm/

I just love this channel. Nice speculations. I am pretty sure that MS move towards WLS was a calculated move, but how does this fit into this picture? How will they port old Win32 apps to ARM? Is there some hidden power play we do not understand?

On which ecosystem to bet? My personal choice is Linux. I am too lazy to migrate my laptop to Ubuntu yet, but looks like 2 options can happen (I am just an idiot, I have no proofs):

- Every consumer device becomes ARM;
- There will be a large turmoil in the market, large players will start adopting Linux as a safe haven;

Interesting)

#hardware
RTX 3090 + Multi-Instance-GPU

So, ~2x faster than 2080 Ti, which is 30% faster than 1080Ti.
2x VRAM.

The only real question for me is, will it support Multi-Instance-GPU?

Let me explain why this is important. Now usually when you train a network, you increase your batch-size to fit the VRAM and monitor your IO and GPU load to ensure saturation.

But if a GPU has 2x VRAM and is 2-3x faster than 1080Ti, then maybe you can have multiple instances of your model on you GPU (that matters only for models that do not scale with large batch-sizes easily).

The only problem is that:

- You cannot use DDP in PyTorch (usually it is faster than DP for 4+ devices), because:

 DDP processes can be placed on the same machine or across machines, but GPU devices cannot be shared across processes.


- So you will have to invent something / change your code / or maybe even use their bleeding edge RPC functions;

If this function is available on 3090 ... then you could turn your GPU into 2-3 virtual GPUs and use it accordingly? That would be truly epic, especially for production use-cases (yeah I know about their SLA)! Also would be great for teamworking.

#hardware
....

A solution? A set of OSS components and off-the-shelf available tools:

- Any MB with at least one or two 10 Gbit/s ethernet ports (the most expensive part) and at least 8-10 SATA slots (PCIE risers can add 2-4 more ports per riser). There are several older mATX SuperMicro boards and a lot of newer overpriced "gaming" boards in this segment

- The cheapest 10 Gbit/s switch or some used "large" switch if you know where / how to buy one

- Any off-the-shelf Fractal Design computer case with 8-10 HDD bays (do not forget 2 "free" 5 inch bays that can be used for 3.5' drives!) or similar (Fractal Design are a bit expensive, but build quality is awesome)

- Any suitable RAM / CPU (for SuperMicro you will have to buy Xeon and ECC RAM)

- Linux as OS (any flavour you like), mdadm for RAID arrays, samba for local sharing

- Just mount your drives locally via fstab (just an example):

//192.168.2.5/share /mnt/share/ cifs username=YOUR_USER,password=YOUR_PASS,iocharset=utf8,uid=YOUR_UID,gid=YOUR_GID,dir_mode=0775,file_mode=0775


- Use any other OSS Linux software, change it as you wish, use any drives you want!

- You can have mdadm, zfs, luks, lvm - whatever you want in any combination;

If you just count the bare minimum cost w/o drives, probably you can get away with US$600-700 per 8 - 16 drives

#hardware
MLPerf Inference v1.0

- Inference Edge v1.0 https://mlcommons.org/en/inference-edge-10/
- Inference Datacenter v1.0 https://mlcommons.org/en/inference-datacenter-10/

The immediate conclusion (as expected) - enterprise kinky party. The second conclusion - they mostly compare vastly different systems (mostly HPC), which is good.

Honestly I do not really care for A100 vs. A?0 vs. Quadro vs. T4, but edge benchmarks are always rare and nice.

The most interesting spreadsheet IMO is this one.

And here I see some quite interesting stuff:

- Firefly-RK3399 looks similar to RPI 4 in performance (has anyone used it?)
- NVIDIA Jetson AGX Xavier looks ~2x faster than both of them (and probably is much more expensive and unobtainable)
- TFLite / ArmNN - but no ONNX or PyTorch on ARM, I wonder why
- int8 very much a must-have on these devices, I see performance boosts up to 2x

PS

Firefly-RK3399 has a PCIE M2 slot, so theoretically you can plug in PCIE accelerator sticks there? =)
It also runs on Ubuntu?

#hardware
#deep_learning
Is Ray Tracing Worth It?

So we had some spare deep learning Ampere GPU laying around and I tested some games with ray tracing / RTX.

The short answer is ... NO.

The long answer - it depends.


Let me explain. I played the following list of titles with ray tracing:

"New"

- Metro Exodus (yeah, I know about GSC Game World controversy) - just the intro. Ray tracing glitched and did not add much, except for water reflections;

- Control. Some scenes were jaw-dropping with ray tracing, but it actually really influenced gameplay only during a handful of situations. The game looks fine without ray tracing;

"Old"

- Ultimate DOOM (yes, the og 1995 game, yes really), PrBOOM+ with ray tracing. The whole game, but skipped shit levels. The game changes completely, becomes more moody and dark. The game kind of EMERGES, becomes more atmospheric;

- Serious Sam TFE, ray traced. Played first 4 levels, got bored. It is a cool tech demo, but lots of glitches and freezes (I have a weak CPU). Changes the mood of the game, but does not add much, except for very dark places;

- Quake II RTX (make sure to have a copy of Quake II when installing to play all levels), full game with RTX. The most impressive one. The only optimized game (it is very funny to hear an Ampere GPU fully loaded when playing ... DOOM lol). Almost all levels are changed, moody, lighting is gorgeous, some effects (like water refraction, thick glass, force fields) are jaw-dropping.

Overall - new games do not really benefit from and revolve around ray tracing. The old titles ... become jaw-dropping visually, moodier and more gritty in terms of gameplay.

Remember when Half-Life 2 was released, everyone wanted to copy their physics, and almost no one made it a proper mechanic? Almost 20 years later, and physics is ubiquitous in game engines (PhysX was acquired by Nvidia, it was a separate card lol), but kind of pointless and disposable in 95% of games (except for HL franchise, Portal and a handful of titles).

Compared to cordless VR multiplayer (and cannot be played at home for obvious reasons), which is on another level of experiences, ray tracing probably will be the same as gaming physics. Very cool, very atmospheric, very moody, but kind of pointless, because building games around it is impossible, risky or not profitable. Better make another shitty COD.

Old games with ray tracing are a rare genuine marvel, though.

I cannot speak for visual artists. Probably for them rendering light on-the-fly is a godsend.

#hardware
The Dark Side Of The Semiconductor Design Renaissance – Fixed Costs Soaring Due To Photomask Sets, Verification, and Validation

- https://semianalysis.substack.com/p/the-dark-side-of-the-semiconductor

Interesting. I have heard some vague estimations about US$100m per hardware startup or US$500m for a chip with a leading edge process, but this article provides some more meat.

#hardware