Offshore
Photo
Ahmad
RT @TheAhmadOsman: the Tenstorrent QuietBox Blackhole

> is a 3.2 Tb/s Ethernet mesh
> that pools memory
> and scales almost linearly
> when you daisy‑chain more boxes

the TT-QuietBox Blackhole comes with

> ~80 lbs liquid-cooled chassis
> AMD EPYC 8124P, 16c/32t
> 512 GB DDR5 ECC
> 4 TB NVMe
> ASRock Rack SIENAD8‑2L2T w/ 2x 10 GbE + IPMI

> 4x Blackhole p150c cards, totalling:
> 560 Tensix Cores
> 64 “big” RISC-V cores
> 128 GB GDDR6
> 840 MB On‑Chip SRAM
> 3.2 Tb/s Ethernet mesh
> 16x QSFP‑DD 800G ports for card⇔card comms
> 8x passive direct‑attach copper (DAC) cables (0.6m)

> all of this is powered by a single
> 1650W Platinum PSU, passively cooled
> ready to daisy-chain to the next QuietBox
> also, opensource stack (TT‑Forge → TT‑NN → TT‑Metalium)

the interconnect is the star

> what does “4x QSFP‑DD 800G” actually mean?

> QSFP‑DD = Quad Small Form‑Factor Pluggable — Double Density
> 8 electrical lanes per port
> ~100 GB/s per lane using PAM4 signalling
> total: 800 Gb/s full‑duplex per port → ~100 GB/s usable each way after Ethernet framing + FEC

each card talks directly to its siblings over QSFP‑DD 800G

> 4 ports per card x 800 Gb/s each =
> 3.2 Tb/s of aggregate bidirectional fabric per card

> 16 ports total per “quietbox” =
> 3.2 Tb/s internal mesh across all 4 cards

> this is your NVLink replacement
> no PCIe bottlenecks, no host-side relays
> just a true east-west ethernet fabric

there’s a hard rule
> the QSFP‑DD 800G ports are passive
> they only connect to other Blackhole cards via direct‑attach copper (DAC)
> max length = 2 meters, not optics, not switches, not uplinks to your ethernet fabric
> Blackhole fabric is its own world: card⇔card, box⇔box, nothing else

daisy‑chain the DACs and you’re all set, add more boxes and enjoy the 3.2 Tb/s ethernet mesh that pools memory and scales almost linearly

pretty sleek hardware UX, more soon
tweet
Offshore
Photo
Ahmad
RT @TheAhmadOsman: My house has 33 GPUs.

> 21x RTX 3090s
> 4x RTX 4090s
> 4x RTX 5090s
> 4x Tenstorrent Blackhole p150a

Before AGI arrives:

Acquire GPUs.

Go into debt if you must.

But whatever you do, secure the GPUs. https://t.co/8U89OStknt
tweet
Offshore
Video
Ahmad
RT @TheAhmadOsman: can’t write code because Cursor and Codex are both down thanks to the aws-us-east-1 outage?

tired of Anthropic’s weekly limits and nerfed models?

with one command and a few GPUs,
you can route Claude Code to your own local LLM with ZERO downtime

Buy a GPU https://t.co/aj8r201V83

i built a simple tool that makes

Claude Code work with any local LLM

full demo:
> vLLM serving GLM-4.5 Air on 4x RTX 3090s
> Claude Code generating code + docs via my proxy
> 1 Python file + .env handles all requests
> nvtop showing live GPU load
> how it all works

Buy a GPU https://t.co/7nYsId4Uyu
- Ahmad
tweet
Dimitry Nakhla | Babylon Capital®
RT @DimitryNakhla: 5 High-Quality Stocks With Good CAGR Potential Assuming Reasonable Multiples 💵

📦 Amazon $AMZN
•2028E EPS: $11.01
•Multiple: 29x
•CAGR: +12%

💰 S&P Global $SPGI
•2028E EPS: $24.41
•Multiple: 28x
•CAGR: +13%

🏦 Fair Isaac $FICO
•2028E EPS: $63.60
•Multiple: 36x
•CAGR: +13%

✈️ Booking Holdings $BKNG
•2028E EPS: $344.04
•Multiple: 23x
•CAGR: +15%

🫱🏼‍🫲🏻 MercadoLibre $MELI
•2028E EPS: $123.53
•Multiple: 30x
•CAGR: +18%
_______

*Estimates can change
tweet
Offshore
Photo
Investing visuals
Presenting you my first deep dive, covering $RBRK:

• Founder led
• Mission-critical
• Growing over 50%
• Named 6x data protection leader

This is the story of a business that evolved from simple backups to an industry-leading cyber resilience platform.

Let’s dive in! (~25 min. read) 🧵👇
tweet
Offshore
Photo
Dimitry Nakhla | Babylon Capital®
This week’s key scheduled reports 🗓️

Let’s dive into the earnings expectations, valuations, & business segments for 20 quality stocks reporting this week 🧵 https://t.co/7ea4rIBaDL
tweet
Offshore
Photo
Clark Square Capital
Just shared a new write-up on a US-listed stock with a very asymmetric return profile.

Be sure to check it out.

Thanks for reading! https://t.co/HjIZ6DeXcj
tweet
Clark Square Capital
Ok, guys. It's been about a month since the last idea thread. What's a good prompt for the next one? I will pick the best one and use that.
tweet
Offshore
Photo
Dimitry Nakhla | Babylon Capital®
RT @DimitryNakhla: 10 Quality Stocks Offering 33% Higher FCF Yield Than the S&P 500 (LTM) 💵

1. $MA 3.18%
2. $DHR 3.18%
3. $INTU 3.21%
4. $V 3.30%
5. $SPGI 3.61%
6. $ADP 4.19%
7. $CSU 4.21%
8. $ICE 4.94%
9. $ABNB 5.37%
10. $BKNG 5.54%
—-

$SPY FCF Yield 2.35% (LTM)

$SPY The S&P 500 free cash flow yield currently sits at 2.35%. https://t.co/MQYaZZGvNE
- Koyfin
tweet
Offshore
Photo
Ahmad
أنا عادةً بكبر دماغي، بس ديه مش أول مرة أحمد ينتقدني فيها، وكل مرة بيزداد كلامه ثقلا على النفس إنها تعديه

والمشكلة إن مفيش إنتقاد واضح أصلًا، لو أنا حسابي مؤذي إعملي بلوك/متعملش فولو، لو هدفك تساعدني وتنصحني ابعت على الخاص سواء نصيحة او استواضح، ومش هتبقى أول مرة نتواصل مع بعض يعني، انما هنقول كلوت ونسخر من غير حتى ما نبقى بننتقد المحتوى نفسه بشكل موسع ونسمع لبعض؟... يا أخي التمس لأخيك ٧٠ عذرًا

ردي عليه واضح، وربنا وحده يعلم بالنوايا، فياريت نخف على بعض ونتعامل بحسن ظن

احمد كان أبتدى كويس و الواحد كان بيحب يسمع بيقول ايه، الاهتمام تحول الي الريتش و الكلوت (أنا عارف انه كلاوت بس كلوت أوقع)

فكل كل الكلام بقى كلام شعبوي للاستهلاك بدون اي فايده حقيقية للبيستهلك المحتوى.

كان نفسي يفضل أحمد الـresearcher او حتى يكون في توازن ما بين الطبلة و ما بين القيمة الحقيقية
- Ahmed
tweet
Ahmad
RT @TheAhmadOsman: - local llms 101

- running a model = inference (using model weights)
- inference = predicting the next token based on your input plus all tokens generated so far
- together, these make up the "sequence"

- tokens ≠ words
- they're the chunks representing the text a model sees
- they are represented by integers (token IDs) in the model
- "tokenizer" = the algorithm that splits text into tokens
- common types: BPE (byte pair encoding), SentencePiece
- token examples:
- "hello" = 1 token or maybe 2 or 3 tokens
- "internationalization" = 5–8 tokens
- context window = max tokens model can "see" at once (2K, 8K, 32K+)
- longer context = more VRAM for KV cache, slower decode

- during inference, the model predicts next token
- by running lots of math on its "weights"
- model weights = billions of learned parameters (the knowledge and patterns from training)

- model parameters: usually billions of numbers (called weights) that the model learns during training
- these weights encode all the model's "knowledge" (patterns, language, facts, reasoning)
- think of them as the knobs and dials inside the model, specifically computed to recognize what could come next
- when you run inference, the model uses these parameters to compute its predictions, one token at a time

- every prediction is just: model weights + current sequence → probabilities for what comes next
- pick a token, append it, repeat, each new token becomes part of the sequence for the next prediction

- models are more than weight files
- neural network architecture: transformer skeleton (layers, heads, RoPE, MQA/GQA, more below)
- weights: billions of learned numbers (parameters, not "tokens", but calculated from tokens)
- tokenizer: how text gets chunked into tokens (BPE/SentencePiece)
- config: metadata, shapes, special tokens, license, intended use, etc
- sometimes: chat template are required for chat/instruct models, or else you get gibberish
- you give a model a prompt (your text, converted into tokens)

- models differ in parameter size:
- 7B means ~7 billion learned numbers
- common sizes: 7B, 13B, 70B
- bigger = stronger, but eats more VRAM/memory & compute
- the model computes a probability for every possible next token (softmax over vocab)
- picks one: either the highest (greedy) or
- samples from the probability distribution (temperature, top-p, etc)
- then appends that token to the sequence, then repeats the whole process
- this is generation:
- generate; predict, sample, append
- over and over, one token at a time
- rinse and repeat
- each new token depends on everything before it; the model re-reads the sequence every step

- generation is always stepwise: token by token, not all at once
- mathematically: model is a learned function, f_θ(seq) → p(next_token)
- all the "magic" is just repeating "what's likely next?" until you stop

- all conversation "tokens" live in the KV cache, or the "session memory"

- so what's actually inside the model?
- everything above-tokens, weights, config-is just setup for the real engine underneath

- the core of almost every modern llm is a transformer architecture
- this is the skeleton that moves all those numbers around
- it's what turns token sequences and weights into predictions
- designed for sequence data (like language),
- transformers can "look back" at previous tokens and
- decide which ones matter for the next prediction

- transformers work in layers, passing your sequence through the same recipe over and over
- each layer refines the representation, using attention to focus on the important parts of your input and context
- every time you generate a new token, it goes through this stack of layers-every single step

- inside each transformer layer:
- self-attention: figures out which previous tokens are important to the current prediction
- MLPs (multi-layer perceptrons): further process token representations, adding non-linearity and expressiveness
- layer n[...]