ksco 的工作日志

^ If the host also supports LL/SC, one may think of using them for emulating the target’s LL/SC. This is dangerous, however, because most processors constrain the instructions that can appear between an LL/SC pair. If these restrictions are not respected, the store might fail spuriously. The extra overhead of dynamic translation, such as TLB lookups and register spills, may thus cause the store to fail forever.

141 viewsYang Liu, edited 05:04

ksco 的工作日志

跨架构模拟 LL/SC 大概有 4 种方案：

1. 执行的时候停掉其他 CPU，执行完再恢复（文章中说这是 QEMU “目前”的做法）
2. 用 CAS 模拟，会有 ABA 问题，但据称这个问题 “almost never matters for real programs”
3. 监控所有 CPU 的 store
4. 如果 host 硬件支持 hardware transactional memory，则可以精确模拟

❤3

374 viewsYang Liu, 05:18

ksco 的工作日志

同架构模拟的情况基本类似，DynamoRIO AArch64 提供了两种方案：

1. 默认方案：用 CAS 模拟
2. 可选方案：搞了一条超级指令，把整个 LL/SC block 打包成一条指令来处理（除了精确和性能好之外，缺点多多）

138 viewsYang Liu, edited 05:27

ksco 的工作日志

同架构模拟的情况基本类似，DynamoRIO AArch64 提供了两种方案： 1. 默认方案：用 CAS 模拟 2. 可选方案：搞了一条超级指令，把整个 LL/SC block 打包成一条指令来处理（除了精确和性能好之外，缺点多多）

打算和 AArch64 一样，先给 RISC-V 用 amoswap 模拟实现，之后再实现可选的 super-instruction 选项。明天再写吧。

👏2

94 viewsYang Liu, 11:20

ksco 的工作日志

基础的 codegen，没有考虑 stolen reg 和 tp reg，把这两个考虑进来后情况还要更复杂一点。


# ---> lr.w/d.aq?.rl? rd, (rs1)
sd scratch1, [scratch_1_slot]
fence rl?
ld rd, 0(rs1)
fence aq?
sd rs1, [tls_lrsc_addr]
li scratch1, SIZE
sd scratch1, [tls_lrsc_size]
sd rd, [tls_lrsc_value]
ld scratch1, [scratch_1_slot]


# ---> sc.w/d.aq?.rl? rd, rs2, (rs1)
sd scratch1, [scratch_1_slot]
sd scratch2, [scratch_2_slot]
ld scratch1, [tls_lrsc_addr]
bne scratch1, rs1, fail
ld scratch1, [tls_lrsc_size]
li scratch2, SIZE
bne scratch1, scratch2, fail
amoswap.aq?.rl? rd, rs2, (rs1)
sne rd, rd, value
j finally
fail:
fence aq?rl?
li rd, 1
finally:
li scratch1, -1
sd scratch1, [tls_lrsc_addr]
ld scratch1, [scratch_1_slot]
ld scratch2, [scratch_2_slot]