https://ieeexplore.ieee.org/document/8543387
Checked C 的设计是在 C 原有的设计上打补丁,以最小的改动来避免某类(越界)错误的发生。在缓冲区溢出可能占了绝大多数 C 库安全问题来源的情况下,用这种方式来解决而不是从头设计一个语言是个挺有趣的思路。
Checked C 的设计是在 C 原有的设计上打补丁,以最小的改动来避免某类(越界)错误的发生。在缓冲区溢出可能占了绝大多数 C 库安全问题来源的情况下,用这种方式来解决而不是从头设计一个语言是个挺有趣的思路。
https://queue.acm.org/detail.cfm?id=3415014
Using services rather than a single centralized database is like going from Newton's physics to Einstein's physics.
All data seen from a distant service is from the "past." By the time you see data from a distant service, it has been unlocked and may change. Each service has its own perspective. Its inside data provides its framework of "now." Its outside data provides its framework of the "past." My inside is not your inside, just as my outside is not your outside.
Using services rather than a single centralized database is like going from Newton's physics to Einstein's physics.
All data seen from a distant service is from the "past." By the time you see data from a distant service, it has been unlocked and may change. Each service has its own perspective. Its inside data provides its framework of "now." Its outside data provides its framework of the "past." My inside is not your inside, just as my outside is not your outside.
https://github.com/seL4/whitepaper
seL4 白皮书(2020-06-10)。对 seL4 的架构,应用场景和各方面设计选型做了简要概述,可以对 seL4 有个整体的概念,也是比较好的了解微内核设计方向的综述。
...and Zircon is almost nine times slower than seL4
seL4 白皮书(2020-06-10)。对 seL4 的架构,应用场景和各方面设计选型做了简要概述,可以对 seL4 有个整体的概念,也是比较好的了解微内核设计方向的综述。
...and Zircon is almost nine times slower than seL4
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
What Every Programmer Should Know About Memory (2007)。一百页以上的内存相关的 All-in-one 手册,涵盖了从内存的硬件原理到和各种其他硬件的交互。建议挑自己感兴趣的章节阅读。
What Every Programmer Should Know About Memory (2007)。一百页以上的内存相关的 All-in-one 手册,涵盖了从内存的硬件原理到和各种其他硬件的交互。建议挑自己感兴趣的章节阅读。
https://hack.org/mc/texts/gosling-wsd.pdf
Window System Design: If I had it to do over again in 2002。 X11 的哪些设计假设已经不再成立,使得 X11 成为了 Linux desktop 的瓶颈。对成为瓶颈的硬件(对 X11 来说是渲染性能)提升的错估几乎是那个年代设计的的通病。不过即使到了快 20 年后, 2021 年解决这些问题的 Wayland 还是没有彻底的取代 X11。
Window System Design: If I had it to do over again in 2002。 X11 的哪些设计假设已经不再成立,使得 X11 成为了 Linux desktop 的瓶颈。对成为瓶颈的硬件(对 X11 来说是渲染性能)提升的错估几乎是那个年代设计的的通病。不过即使到了快 20 年后, 2021 年解决这些问题的 Wayland 还是没有彻底的取代 X11。
http://erlang.org/download/armstrong_thesis_2003.pdf
At the highest level of abstraction an architecture is “a way of thinking about the world.”
对 erlang 语言本身不是特别有兴趣的可以跳着看下 2,5,10 章和 APPENDIX B。
一些内容的 TL;DR 版本:
1. A component is considered faulty once its behaviour is no longer consistent with its specification.
2. We say a system is fault-tolerant if its programs can be properly executed despite the occurrence of logic faults.
- 系统中存在无法提前预估到的错误是不可避免的
- 既然错误无法避免,需要一种方法把无法预估到的错误转化为可以被预测到的逻辑,这是一种错误隔离的机制
- 要在这种情况下继续响应请求,可以降级到更简化更不容易出错的逻辑
erlang 的这种架构在其他系统中以更大规模的进程 / 服务 / 集群的形式广泛存在,其中最核心的理念就是错误隔离
At the highest level of abstraction an architecture is “a way of thinking about the world.”
对 erlang 语言本身不是特别有兴趣的可以跳着看下 2,5,10 章和 APPENDIX B。
一些内容的 TL;DR 版本:
1. A component is considered faulty once its behaviour is no longer consistent with its specification.
2. We say a system is fault-tolerant if its programs can be properly executed despite the occurrence of logic faults.
- 系统中存在无法提前预估到的错误是不可避免的
- 既然错误无法避免,需要一种方法把无法预估到的错误转化为可以被预测到的逻辑,这是一种错误隔离的机制
- 要在这种情况下继续响应请求,可以降级到更简化更不容易出错的逻辑
erlang 的这种架构在其他系统中以更大规模的进程 / 服务 / 集群的形式广泛存在,其中最核心的理念就是错误隔离
https://how.complexsystems.fail
> A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws.
写于 1998 年,即使描述的对象(e.g. transportation, healthcare, power generation)是和互联网公司完全不同类型的系统,复杂系统的本质并没有太多变化
> A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws.
写于 1998 年,即使描述的对象(e.g. transportation, healthcare, power generation)是和互联网公司完全不同类型的系统,复杂系统的本质并没有太多变化
https://arxiv.org/pdf/1305.4924.pdf
The experiments reported in this paper indicate that even a very basic nonlinear model is generally more accurate than the state-of-the-art linear model in the computer performance literature.
常见的非线性原因:L1 L2 cache,分支预测
The experiments reported in this paper indicate that even a very basic nonlinear model is generally more accurate than the state-of-the-art linear model in the computer performance literature.
常见的非线性原因:L1 L2 cache,分支预测
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/llama-vldb2013.pdf
LLAMA:一个通用的 page cache + storage management 的设计。作为一个存储系统中抽出的通用组件,LLAMA 设计了一套泛用高效的接口,并且讨论了许多解决竞态 / crash recovery 的方案。
LLAMA:一个通用的 page cache + storage management 的设计。作为一个存储系统中抽出的通用组件,LLAMA 设计了一套泛用高效的接口,并且讨论了许多解决竞态 / crash recovery 的方案。
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/acrobat-17.pdf
系统设计的常见 slogan 合集(1983)。其中很多系统的设计原则很难靠几句话几个例子理解,不过还是能提供一些思路。
The main reason interfaces are difficult to design is that each interface is a small programming language: it defines a set of objects and the operations that can be used to manipulate the objects.
系统设计的常见 slogan 合集(1983)。其中很多系统的设计原则很难靠几句话几个例子理解,不过还是能提供一些思路。
The main reason interfaces are difficult to design is that each interface is a small programming language: it defines a set of objects and the operations that can be used to manipulate the objects.
https://docs.wixstatic.com/ugd/0c1418_d9878707bbb7427786b70c3c91d5fbd1.pdf
Introduction to Compute Express Link。想象一下一台电脑中有多少设备有自己的内存:SSD/HDD/Raid cache、显存、SmartNIC。因为使用了 DRAM cache,为了提供掉电保护,很多设备有需要提供一套自己的电池 / 电容。这些都大幅提高了单个设备的成本,并且让硬件变得复杂很多。CXL 提供了一套基于 PCIE 5.0 及以上的协议,让设备和主内存可以共享资源,协议有三个部分:
- CXL.io protocol is based on PCIe and is used for the functions such as device discovery, configuration, initialization, I/O virtualization, and direct memory access (DMA) using non- coherent load-store, producer-consumer semantics.
- CXL.cache enables a device to cache data from the host memory, employing a simple request and response protocol.
- CXL.memory allows a host processor to access memory attached to a CXL device.
CXL.io 提供了最基本的支持 CXL 设备的发现和设置功能;CXL.cache 让设备可以使用主内存(或者任意 byte addressable 的内存性质的)来作为缓存,比如 DRAM-less 的 SSD 可以用系统的主内存来作为读写缓存;CXL.memory 让 CPU 可以直接访问 CXL 设备的内存(类似于 AMD 的 Smart Access Memory 对所有 CXL 设备的通用版本)。
几个现实场景:使用 CXL.cache 作为读写缓存的 SSD 不用再实现自己的掉电保护,host 可以提供统一的掉电保护功能;host 可以提供 16G 的 optane(persistent memory)来作为任何 CXL 设备的缓存,就按现在 Intel 的 M10 大小来说,是大多数设备单独提供缓存时不能承受的成本。
支持 CXL 1.1 的设备预计在 2022 年就会开始推出,这些设备本身也和 PCIE 5.0 兼容。
Introduction to Compute Express Link。想象一下一台电脑中有多少设备有自己的内存:SSD/HDD/Raid cache、显存、SmartNIC。因为使用了 DRAM cache,为了提供掉电保护,很多设备有需要提供一套自己的电池 / 电容。这些都大幅提高了单个设备的成本,并且让硬件变得复杂很多。CXL 提供了一套基于 PCIE 5.0 及以上的协议,让设备和主内存可以共享资源,协议有三个部分:
- CXL.io protocol is based on PCIe and is used for the functions such as device discovery, configuration, initialization, I/O virtualization, and direct memory access (DMA) using non- coherent load-store, producer-consumer semantics.
- CXL.cache enables a device to cache data from the host memory, employing a simple request and response protocol.
- CXL.memory allows a host processor to access memory attached to a CXL device.
CXL.io 提供了最基本的支持 CXL 设备的发现和设置功能;CXL.cache 让设备可以使用主内存(或者任意 byte addressable 的内存性质的)来作为缓存,比如 DRAM-less 的 SSD 可以用系统的主内存来作为读写缓存;CXL.memory 让 CPU 可以直接访问 CXL 设备的内存(类似于 AMD 的 Smart Access Memory 对所有 CXL 设备的通用版本)。
几个现实场景:使用 CXL.cache 作为读写缓存的 SSD 不用再实现自己的掉电保护,host 可以提供统一的掉电保护功能;host 可以提供 16G 的 optane(persistent memory)来作为任何 CXL 设备的缓存,就按现在 Intel 的 M10 大小来说,是大多数设备单独提供缓存时不能承受的成本。
支持 CXL 1.1 的设备预计在 2022 年就会开始推出,这些设备本身也和 PCIE 5.0 兼容。
https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf
Ironies of automation (1983)
- 系统自动化的部分出发点是降低维护成本
- 维护高度自动化的系统需要对该自动化系统相关领域知识有深入了解的操作员,高技能的操作员需要极高的培养成本
- 在系统自动化程度大于某个阈值时,培养操作员的成本会大于开发自动化系统的人员的成本,这个界限就是 ops 失去意义,devops 和 SRE 成为系统维护员的时刻
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
Ironies of automation (1983)
- 系统自动化的部分出发点是降低维护成本
- 维护高度自动化的系统需要对该自动化系统相关领域知识有深入了解的操作员,高技能的操作员需要极高的培养成本
- 在系统自动化程度大于某个阈值时,培养操作员的成本会大于开发自动化系统的人员的成本,这个界限就是 ops 失去意义,devops 和 SRE 成为系统维护员的时刻
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
https://people.eecs.berkeley.edu/~krste/papers/maas-isca18-hwgc.pdf
A Hardware Accelerator for Tracing Garbage Collection
可以把这个硬件想象为一个专门用来 offload GC 开销的特化 CPU 核心
> Most work on hardware-assisted GC was done in the 1990s and 2000s when Moore’s Law meant that next-generation general-purpose processors would typically outperform specialized chips for languages such as Java, even on the workloads they were designed for. This gave a substantial edge to non-specialized processors. However, with the end of Moore’s Law, there is now a renewed interest in accelerators for common workloads.
在通用硬件无法享受制程迭代提升的情况下,可能会有更多特化的专有硬件的出现
A Hardware Accelerator for Tracing Garbage Collection
可以把这个硬件想象为一个专门用来 offload GC 开销的特化 CPU 核心
> Most work on hardware-assisted GC was done in the 1990s and 2000s when Moore’s Law meant that next-generation general-purpose processors would typically outperform specialized chips for languages such as Java, even on the workloads they were designed for. This gave a substantial edge to non-specialized processors. However, with the end of Moore’s Law, there is now a renewed interest in accelerators for common workloads.
在通用硬件无法享受制程迭代提升的情况下,可能会有更多特化的专有硬件的出现
http://nischalshrestha.me/docs/cross_language_interference.pdf
Why Is It Difficult for Developers to Learn Another Programming Language?
主要的学习场景:
1. Learning on their own: Programmers lacked formal training for the new language and its associated technology stack, leaving learning to themselves.
2. Just-in-time learning: Programmers focused on only learning features as needed.
3. Relating new language to previous languages: Programmers tried to map features of the new language to their previous languages.
主要的困难:
1. Old habits die hard: Programmers had to constantly suppress old habits from previous languages.
2. Mindshifts when switching paradigms: Sometimes programmers wrestled with larger differences that required fundamental shifts in mindsets, or “mindshifts.”
3. Little to no mapping with previous languages: Programmers had a harder time learning the new language when there was little to no mapping of features to previous languages.
4. Searching for terms and documentation is hard: Programmers found it difficult to search for information about the language and its associated technologies.
5. Retooling is a challenging first step: Programmers faced difficulty retooling themselves in the environment of the new language.
学习新语言时由于自己的经验被 challenge,很容易陷入 defensive 的焦虑心态,特别是当新语言的特性 / 功能和预期(常常来自过去的经验)不符合时。虽然人不是理智的生物,无法避免这种情绪的产生,了解一些普遍的原因还是可以有助于调整自己的心态和帮助别人。
Why Is It Difficult for Developers to Learn Another Programming Language?
主要的学习场景:
1. Learning on their own: Programmers lacked formal training for the new language and its associated technology stack, leaving learning to themselves.
2. Just-in-time learning: Programmers focused on only learning features as needed.
3. Relating new language to previous languages: Programmers tried to map features of the new language to their previous languages.
主要的困难:
1. Old habits die hard: Programmers had to constantly suppress old habits from previous languages.
2. Mindshifts when switching paradigms: Sometimes programmers wrestled with larger differences that required fundamental shifts in mindsets, or “mindshifts.”
3. Little to no mapping with previous languages: Programmers had a harder time learning the new language when there was little to no mapping of features to previous languages.
4. Searching for terms and documentation is hard: Programmers found it difficult to search for information about the language and its associated technologies.
5. Retooling is a challenging first step: Programmers faced difficulty retooling themselves in the environment of the new language.
学习新语言时由于自己的经验被 challenge,很容易陷入 defensive 的焦虑心态,特别是当新语言的特性 / 功能和预期(常常来自过去的经验)不符合时。虽然人不是理智的生物,无法避免这种情绪的产生,了解一些普遍的原因还是可以有助于调整自己的心态和帮助别人。
https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s11-bronson.pdf
Metastable Failures in Distributed Systems。这篇是最接近我对这类分布式系统事故的模式的理解的。这个理论把系统分为两种状态:
- 稳态:系统正常运作,并且在外部信号改变时能回到稳态
- 亚稳态:系统正常运作,但是当特定外部信号改变时无法回到稳态
亚稳态系统的典型特征:系统中存在某种正反馈循环,在特定情况下会导致系统的特定部分被不断放大直到耗尽资源。任何试图将系统恢复到稳态的尝试都会被无法终止的正反馈循环再次耗尽资源,所以亚稳态系统是无法从故障中自动恢复的,除非将放大器的部分从系统中摘除。亚稳态系统故障的 Root cause 是正反馈循环而不是触发正反馈循环的事件。
Metastable Failures in Distributed Systems。这篇是最接近我对这类分布式系统事故的模式的理解的。这个理论把系统分为两种状态:
- 稳态:系统正常运作,并且在外部信号改变时能回到稳态
- 亚稳态:系统正常运作,但是当特定外部信号改变时无法回到稳态
亚稳态系统的典型特征:系统中存在某种正反馈循环,在特定情况下会导致系统的特定部分被不断放大直到耗尽资源。任何试图将系统恢复到稳态的尝试都会被无法终止的正反馈循环再次耗尽资源,所以亚稳态系统是无法从故障中自动恢复的,除非将放大器的部分从系统中摘除。亚稳态系统故障的 Root cause 是正反馈循环而不是触发正反馈循环的事件。
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8704965
Evolution of the Unix System Architecture: An Exploratory Case Study。其中有对于 FreeBSD 从 1970 年(Research PDP7)到现在的架构的演进过程的简单介绍,可以帮助理解一些现在看来不合理的设计是在什么样的背景下做出的决定。现代的操作系统已经过于复杂而让人经常不知道从哪里入手开始了解,这类从最早的原型开始逐渐介绍演变过程的可能是更好的了解操作系统构成的资料。
Evolution of the Unix System Architecture: An Exploratory Case Study。其中有对于 FreeBSD 从 1970 年(Research PDP7)到现在的架构的演进过程的简单介绍,可以帮助理解一些现在看来不合理的设计是在什么样的背景下做出的决定。现代的操作系统已经过于复杂而让人经常不知道从哪里入手开始了解,这类从最早的原型开始逐渐介绍演变过程的可能是更好的了解操作系统构成的资料。
https://dl.acm.org/doi/10.1145/3465480.3467835
Thinking in events: from databases to distributed collaboration software。Event sourcing 架构的综述,着重介绍了需要对事件持久化的场景。在理想情况下任何后端的状态能依靠对外部事件的重放来重现,把 source of truth 从数据库转变为事件记录,数据库只是系统状态的一个 snapshot。
Thinking in events: from databases to distributed collaboration software。Event sourcing 架构的综述,着重介绍了需要对事件持久化的场景。在理想情况下任何后端的状态能依靠对外部事件的重放来重现,把 source of truth 从数据库转变为事件记录,数据库只是系统状态的一个 snapshot。
https://www.usenix.org/conference/nsdi21/presentation/ghigoff
BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing。架构上的改进虽然成立,做法本身有点 ad-hoc,非常强的依赖于 UDP 本身足够简单且无状态,memcached 业务场景大量零散查询,才能在网络栈处理请求之前增加一层 kv cache 提前返回一部分查询结果,eBPF 是否能拓展到更复杂的情况也存疑。比较有意思的是和 kernel-bypass (Userspace Network Stack) 的对比,即使 eBPF 有不小的性能惩罚,现阶段缺少 Userspace Interrupt 的 kernel-bypass 的 polling 开销也远大于 eBPF。
BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing。架构上的改进虽然成立,做法本身有点 ad-hoc,非常强的依赖于 UDP 本身足够简单且无状态,memcached 业务场景大量零散查询,才能在网络栈处理请求之前增加一层 kv cache 提前返回一部分查询结果,eBPF 是否能拓展到更复杂的情况也存疑。比较有意思的是和 kernel-bypass (Userspace Network Stack) 的对比,即使 eBPF 有不小的性能惩罚,现阶段缺少 Userspace Interrupt 的 kernel-bypass 的 polling 开销也远大于 eBPF。
https://doi.org/10.1145/3243176.3243195
Biased Reference Counting: Minimizing Atomic Operations in Garbage Collection。RC 操作在测试的swift 客户端场景中平均占 42% 的运行时间,RC 的 atomic 操作平均占 25% 的时间。观察到大多数对象很少跨线程,所以将 RC 的计数器区分出 owner 线程(不需要 atomic)和公共线程,最终减少了客户端 22.5% 的平均执行时间。感觉如果 CPU 能提供专门的 RC 指令然后分发给加速器而不在当前的指令流水线处理,能解决 atomic 对性能影响的大多数情况,这类 RC 场景并不需求非常严格的内存释放的实时性。
Biased Reference Counting: Minimizing Atomic Operations in Garbage Collection。RC 操作在测试的swift 客户端场景中平均占 42% 的运行时间,RC 的 atomic 操作平均占 25% 的时间。观察到大多数对象很少跨线程,所以将 RC 的计数器区分出 owner 线程(不需要 atomic)和公共线程,最终减少了客户端 22.5% 的平均执行时间。感觉如果 CPU 能提供专门的 RC 指令然后分发给加速器而不在当前的指令流水线处理,能解决 atomic 对性能影响的大多数情况,这类 RC 场景并不需求非常严格的内存释放的实时性。