duangsuse::Echo

#Rust
Rust编译器贡献者 Nicholas Nethercote 在线找工作《I am a Rust compiler engineer looking for a new job》，连他也在抱怨“AI is sucking up a lot of money and attention in the tech world, leaving less for everything else.”。

Nicholas Nethercote

I am a Rust compiler engineer looking for a new job

UPDATE 2025-09-03: I have found a new job and will be starting next week. I will post more details soon. Many thanks to everyone who helped publicize this post and to everyone who contacted me about possible work. Rust is being used in many interesting places!

164 views03:14

duangsuse::Echo

Forwarded from Frost's Notes

#share 头回见对这两个概念的清楚解释
https://discuss.python.org/t/should-it-be-possible-to-use-none-as-a-type-instead-of-nonetype/101862/2

Discussions on Python.org

Should it be possible to use None as a type (instead of NoneType)?

The type of None is NoneType [not a built-in, it’s type(None)], but in typing we use None as a type: my_var: None|int = 3 Besides of typing, Pyhon does not accept None as a type. Technically None is indeed not a type. OTOH None is quite special. I think…

❤1

128 views01:53

duangsuse::Echo

Forwarded from QC 的小树林

发现 perf trace 比 strace 快巨多。已经把 strace -cf 换成 sudo perf trace -s --summary-mode=total 了

❤2

129 views01:54

duangsuse::Echo

Forwarded from QC 的小树林

因为现代 CPU 都是超标量的，有多个功能单元（比如两个算术逻辑单元 + 一个浮点运算单元 + 别的），可以一次性发射多条指令到不同的功能单元一起计算，因此一个 cycle 内实际上可以执行多条指令，称之为 instruction-level parallelism（ILP）。当然高并行的前提是有足够的功能单元以及指令之间没有依赖。有关内容可以阅读一下《现代处理器结构》。

我还在用 zen2 没有 TMA 可用。但对于这个场景有非常适合的工具：uiCA 和 llvm-mca。

举个例子，这里的无脑算法我们在 godbolt.org 用 gcc15.2 + O3 的到循环体部分的汇编如下：


.L3:
        movsd   xmm2, QWORD PTR [rax]
        add     rax, 8
        mulsd   xmm2, xmm0
        mulsd   xmm0, xmm3
        addsd   xmm1, xmm2
        cmp     rdx, rax
        jne     .L3

把这串代码丢到 uiCA 模拟，可以得到结果是 4 cycles per iteration。而霍纳算法是 8 cycles per iteration。但是为什么呢？

在 HTML output 里面打开 Trace Table，里面有流水线的可视化模拟，包括指令使用了哪个端口、在第几个周期发射（I）、调度（D）、执行（E）、退役（R）等等。如果调度等了很久，那可能是端口不太够；如果退役等了很久，那可能是指令之间有依赖。在霍纳算法的图里，我们可以看到超长的 E->R 距离，所以降低 IPC 的主因就是指令间依赖了。更多有关 Trace Table 的信息，可以看《Visualizing Performance-Critical Dependency Chains》。

110 views07:41

duangsuse::Echo

Forwarded from yihong0618 和朋友们的频道 (伊)

https://bilibili.com/video/BV1rvbazSEiy

Bilibili

一镜到底！挑战良子同款16袋泡面，超多人要看能成功吗？最难的点竟然是……_哔哩哔哩_bilibili

大家点名想看的16袋泡面来了，另外37根油条，就别让我挑战了，我是一根都吃不下，我从小不爱吃油条，吃着就感觉特别恶心，油条油饼豆腐脑这些都不爱吃, 视频播放量 3685034、弹幕量 12514、点赞数 123883、投硬币枚数 22382、收藏人数 7331、转发人数 15214, 视频作者橙飞一下, 作者简介人二为仁，人仁忍韧。全马pb3:43:56，半马pb1:45:27。
美食区最能跑，跑步区最能吃，带你吃好吃又便宜的自助。曾经是小学老师。
，相关视频：良子十大战役速通，哪场才是你心中的第一…

104 views07:43

duangsuse::Echo

#rust #js https://oxc.rs/docs/learn/performance

Oxc

Performance