^ If the host also supports LL/SC, one may think of using them for emulating the target’s LL/SC. This is dangerous, however, because most processors constrain the instructions that can appear between an LL/SC pair. If these restrictions are not respected, the store might fail spuriously. The extra overhead of dynamic translation, such as TLB lookups and register spills, may thus cause the store to fail forever.
跨架构模拟 LL/SC 大概有 4 种方案:
1. 执行的时候停掉其他 CPU,执行完再恢复(文章中说这是 QEMU “目前”的做法)
2. 用 CAS 模拟,会有 ABA 问题,但据称这个问题 “almost never matters for real programs”
3. 监控所有 CPU 的 store
4. 如果 host 硬件支持 hardware transactional memory,则可以精确模拟
1. 执行的时候停掉其他 CPU,执行完再恢复(文章中说这是 QEMU “目前”的做法)
2. 用 CAS 模拟,会有 ABA 问题,但据称这个问题 “almost never matters for real programs”
3. 监控所有 CPU 的 store
4. 如果 host 硬件支持 hardware transactional memory,则可以精确模拟
❤3
同架构模拟的情况基本类似,DynamoRIO AArch64 提供了两种方案:
1. 默认方案:用 CAS 模拟
2. 可选方案:搞了一条超级指令,把整个 LL/SC block 打包成一条指令来处理(除了精确和性能好之外,缺点多多)
1. 默认方案:用 CAS 模拟
2. 可选方案:搞了一条超级指令,把整个 LL/SC block 打包成一条指令来处理(除了精确和性能好之外,缺点多多)
ksco 的工作日志
同架构模拟的情况基本类似,DynamoRIO AArch64 提供了两种方案: 1. 默认方案:用 CAS 模拟 2. 可选方案:搞了一条超级指令,把整个 LL/SC block 打包成一条指令来处理(除了精确和性能好之外,缺点多多)
打算和 AArch64 一样,先给 RISC-V 用 amoswap 模拟实现,之后再实现可选的 super-instruction 选项。明天再写吧。
👏2
基础的 codegen,没有考虑 stolen reg 和 tp reg,把这两个考虑进来后情况还要更复杂一点。
# ---> lr.w/d.aq?.rl? rd, (rs1)
sd scratch1, [scratch_1_slot]
fence rl?
ld rd, 0(rs1)
fence aq?
sd rs1, [tls_lrsc_addr]
li scratch1, SIZE
sd scratch1, [tls_lrsc_size]
sd rd, [tls_lrsc_value]
ld scratch1, [scratch_1_slot]
# ---> sc.w/d.aq?.rl? rd, rs2, (rs1)
sd scratch1, [scratch_1_slot]
sd scratch2, [scratch_2_slot]
ld scratch1, [tls_lrsc_addr]
bne scratch1, rs1, fail
ld scratch1, [tls_lrsc_size]
li scratch2, SIZE
bne scratch1, scratch2, fail
amoswap.aq?.rl? rd, rs2, (rs1)
sne rd, rd, value
j finally
fail:
fence aq?rl?
li rd, 1
finally:
li scratch1, -1
sd scratch1, [tls_lrsc_addr]
ld scratch1, [scratch_1_slot]
ld scratch2, [scratch_2_slot]
https://www.youtube.com/watch?v=vUwsfmVkKtY
用
视频的结论更有意思 🤔️
用
[[clang::musttail]] 写解释器,试图解决 Mike Pall 在著名的 msg00742 中说的寄存器分配和 slow path 冲刷 I$ 的问题。视频的结论更有意思 🤔️
YouTube
A Deep Dive Into Dispatching Techniques in C++ - Jonathan Müller - CppNow 2023
https://www.cppnow.org
https://www.linkedin.com/company/cppnow
---
Dispatching Techniques in C++ - Jonathan Müller - CppNow 2023
Slides: https://github.com/boostcon
---
At the core of an interpreter is a loop that iterates over instructions and executes…
https://www.linkedin.com/company/cppnow
---
Dispatching Techniques in C++ - Jonathan Müller - CppNow 2023
Slides: https://github.com/boostcon
---
At the core of an interpreter is a loop that iterates over instructions and executes…