Architecture Weekly
2.99K subscribers
4 photos
725 links
Architecture Weekly newsletter originated at https://blog.vvsevolodovich.dev. ~10 articles or videos on solution architecture and system design every week!.
Download Telegram
We exceeded 1024 subscribers! Thanks everyone for the following 🙂 If you like the work I am doing, feel free to support here or here 🙂 Also, I got a couple of awesome people scheduled for the interviews the following weeks, so stay tuned for the updates!
🔥5👍2
Architecture Weekly #65 - Highlights

Tech Radar by Thoughtworks 👷‍♂️
The new issue of the Tech Radar by Thoughtworks. This time they highlight the rise of AI Tools for Software Development, marking GitHub Copilot for "Assess", highlighting the ease of adding accessibility to web and mobile applications and warning about the proper use of Lambda functions. See more details there!

#techradar

IaaS pricing patterns and Trends 2022 👷‍♂️
A major part of choosing a platform to host the solution is the cost. That's why it is useful to understand the cloud vendor's landscape regarding the price. Here you can find a report on pricing patterns and trends with data gathered at the end of 2022. Unfortunately, the vendors make it impossible to compare the VMs offerings, but the report tries to do it best. Looks like VMs are the most expensive at Amazon, but the disks there are the cheapest... but there is a lot of parameters inside.

#cloud #cost

The Fractal Geometry of Software Design 🍼
Vlad Khononov is one of my favourite authors on DDD. This time I am sharing his talk about energy transfer systems, which software design can relate to. And as such systems happen to be fractals, software design has the same patters. Sounds intriguing? Watch the video!
2🔥1
Architecture Weekly #65 - Follow Up

Intro to AWS Well-Architected Framework 🍼
If the organization is new to the cloud of need a substantial architecture change, AWS can support it with it's Well-Architected Framework: the collection of articles and papers on how to do things right, including Operational Excellence, Security, Performance and other important aspects called pillars. See more details in the article by "A Cloud Guru".

#aws #cloud #architecture

A brief history of high availability 👷‍♂️
What does high availability mean in particular? What's the difference between fault tolerance and high availability? How we came up with multi-active availability for databases? The answers to those questions you will find in the article in the Cockroach Labs Blog.

#db #availability #replication

How RocksDB works 🤟
Previously we already considered how some of the database engines could work, for example how B-trees can be leveraged for reading and writing data. I stumbled upon the article by Artem Krylysov where he explains how the RocksDB work - an embeddable database that powers Yugabyte, TitaniumDB and others - which happen to use LSM - a log-structured merge tree. Follow the article for the details!

#db #lsm

Cross-shard transactions at 10 million requests per second
Two-phase commits is a long-known technique for distributed transactions. Dropbox blog contains an article with the story of the implementation of this easy-on-paper protocol within a cluster of thousands of MySQL databases handling petabytes of metadata to support user-facing features. Follow the fascinating journey!

#databases #2pc

Replacing RabbitMQ with the Postgres Queue 👷‍♂️
Usually, when you need some queue processing you switch from a database implementation to something like Kafka. But Prequel did the opposite: having RabbitMQ to manage task queues they decided to switch to a PostgreSQL table because it is impossible to disable message prefetching in Rabbit. Have more details inside!

Grokking Scalability for System Design 👷‍♂️
And another primer on system design! This time on scalability. Find out the 2 types of scalability, the tactics for it and overall approaches in the Grokking Scalability article.

Like the newsletter? Wanna receive new content earlier, than everybody else? Consider helping to run it at Patreon or Boosty. The funds go to pay for the hosting and some software like a Camo Studio license. Patrons and Boosty subscribers of a certain level also get access to a private Architecture Community and of course every supporter gets early access. Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman and Egor for already supporting the newsletter. Join them as well!
👍7
Architecture Weekly pinned «This Wednesday we are talking about automated mobile testing with Marathon test runner author! Book your calendars :) https://youtube.com/live/S-4XbJpwaaI?feature=share»
Imagine your entire production environment going down or losing the database disks. What about a datacenter going off? What strategies should we plan to employ? How to execute the recovery efficiently? Figure out with Mikhail Druzhinin - an EM at DataDog with more than 500 hours of incident resolution time!

Broadcast

11.05.2023 18:00 GMT+3
👍3
This week I conducted an interview with Anton Malinskiy - an author of Marathon Test Runner and co-founder of Marathon Labs. We discussed Mobile Testing, the challenges behind it and how Marathon helps to resolve them. Watch the video!

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor Roman and Evgeniy for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Reducing cost by 90% by rewriting microservices to a monolith 👷‍♂️
Yeah, you read it right! Prime Video - a video streaming product of Amazon with their own technical blog - dropped a piece which exploded over all the technical communities I am participating. Twitter, work slacks, telegram groups - all are referencing this article. So what's the hype? Prime Video has a service called Video Quality Analysis. It is supposed to identify any problems with video streaming and report it for further fix and investigation. The initial architecture leveraged Amazon Lambdas and Step Functions, but most importantly it was distributed by nature, which caused the usage of an S3 bucket for data sharing between the microservices. Apparently, it is very costly! So after some consideration, the team decided to move to a monolith. Find out the details of that story below, and remember, that on our YouTube channel, we kinda told you.

#refactoring #microservice #distributedsystem

Real-time Messaging at Slack 👷‍♂️
Slack handles tens of millions of simultaneously connected clients every second and manages to deliver any message under 500 ms all over the world. They built a pretty sophisticated system consisting of Channel Servers, Edge Proxies, Gateway servers and Web Apps. They posted a good article with the explanations of those in the technical blog, grab the read!

#highload #architecture #casestudy

Secure Search Over Encrypted Data 👷‍♂️
The common understanding is that once you encrypted the data, the only way you can do any operations over it, like modification or search, is possible only by decrypting the data first. It leads to a bunch of problems like key management, exposing the plane data to untrusty agents and others. But with the development of homomorphic encryption, you can at least search over encrypted data just find. Our friends from Cossack Labs share the article explaining the hustle.

#encryption #security
👍2🔥1
Architecture Weekly #66 - Follow Up
Deterministic Simulation: A New Era of Distributed Testing 🤟
Ensuring the correctness of distributed system is hard. Some people tend to use formal verification, while others seek to test all the possible cases. Both approaches are hard. However, deterministic simulation can be a combination of both tactics - and a very powerful one. Find an article on deterministic simulation engine and what it takes to simulate the distributed systems behavior.

#distributedsystem

Make Architecture Reviews like Peer Reviews 🍼
Architecture reviews, or committees to be more precise, have the bad reputation of slowing down initiatives with useless templates and discussions. While taking decisions in a silo with a high degree of autonomy is satisfying, it has a high probability of missing critical information that leads to costly reworks afterwards. So the question here is how to ensure the appropriate aligned architecture while not compromising on quality. Find out in the article below. To my taste, it is a bit of an overkill, but can work well even for a small org after an adoption.

#architecture #adr #documentation

Kubernetes Security Part 1 - Security Context 👷‍♂️
Kubernetes runs a major part of the work payloads nowadays. And we need to run those securely. I am sharing a very deep detailed guide on adding security context to the container we run there alongside with scanning docker images, configuring network policies, implementing RBAC model and many more!

#security #kubernetes #k8s

The API. The Book 👷‍♂️
My colleague - Sergey Konstantinov - wrote an online book on API-first development principles covering a vast spectrum of topics from authentication and authorization, API Design, Backward Compatibility and API as a product. Start reading while the additional parts of the second edition is being written now!

#api #apidesign

System Design Blue Print 🍼
I promise this is the last system design blueprint or an ultimate guide or you name it. But the folks asking for a consultation are always asking what should I at least be aware of to be ready for a systems design interview... Such articles help. However, they don't give you much detail - rather an overview.

#systemdesign

Agility and Architecture 🍼
The talk about how we combine the architecture work with the agile iterative approach is long and controversial. Somebody say, make the big upfront design, others insist of postponing all the decisions to the last possible moment. I am sharing a new article on InfoQ, which explores those takes and explains for example that there are no such "last possible moments" in software development and you rather have some Minimal Viable Architecture, which you can iterate on.

#agile

Next Thursday, I am discussing Disaster Recovery with Misha Druzhinin. Join the live stream!
👍5
Architecture Weekly pinned «Modules in Build Systems https://youtube.com/shorts/l2gkhLUtfKI?feature=share»
Architecture Weekly pinned «Live in an hour!»
Architecture Weekly #67

This week we handled a discussion on Disaster Recovery with Misha Druzhining. And you won't believe what happened in the middle of the broadcast.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Database Sharding Explained 🤟
Sharding is an important concept to ensure the reliability and performance of the overall system. You can do that in a variety of ways, which of them can cause it's own problems. Architecture Notes blog has a free post explaining in deep details what the sharding is in a nutshell.

#db #sharding

How Tinder built its own API Gateway 👷‍♂️
Tinder tried multiple solutions for API Gateway, including AWS API Gateway, Apigee, Kong and others. But in the end, they decided they really needed a bespoke solution to match their requirements of scalable, reusable and configuration-based demands. So they took Spring Cloud Gateway and built their solution on top. Find what they managed to achieve in the article in Tinder Tech Blog.

#api #apigateway #architecture #casestudy

Migrating Critical Traffic at Scale with No Downtime - Part 1 👷‍♂️
Bringing new infrastructure to the production load is always a little risky. For Netflix which wants to ensure an uninterrupted watching experience this is a critical technical capability. In the latest blog post, they explain that real traffic replay plays a crucial role in testing new services and they built a special solution including a replay server. Follow the article for the details!

#sre #casestudy
👍1
Architectue Weekly #67 Follow-Up

Software Architecture Canvas 🍼
I am a big proponent of Solution Architecture Documents, RFCs and ADRs. But it's always good to take a fresh look. Patrick Roos shared a new format to allow the collaborative effort to architecture: the Canvas. I especially like the strong demand for the business case(top of green) and the risks and challenges(in blue). Give it a try!

#documentation

The Inner Workings of Distributed Databases 🤟
Alex Pelagenko begins an article with a nice analogy: he gets to the office by the bike, but if it failes - should there be a replacement? Same happens with the databases: if the first node fails, the should be a standby. But should the replication by sync or async? Should it be a master-master replication? Alex considers several databases and demonstrates the sequence diagrams how they handle disconnection issues.

#db #timeseries

Building a large scale unsupervised model anomaly detection system 🤟
Lyft leverages tons of ML models to define a wide range of parameters from ETAs to pricing. But they also need to understand if those model perform well. The problem is that different model different number of features and outputs. So they need to unify and process them efficiently. Find how they do it in the blog post!

#ml

2023 State of Platform Engineering Report 🍼
The word DevOps is mentioned less frequently while people speak more and more about Platform Engineering. Perforce is publishing it's report on Platform Engineering, and among many valuable insights, you will find the statement about companies underinvesting in the product managers for the platforms - because it's still a product, even for your internal developers. Find the report download below, and while you're going through it, turn the discussion about developer relations with Baruch Sadogursky here.

#devops #platformengineering

Passwords are no more 🍼
Passwords have a long history of problems like being easy to brute force, phished and prone to social engineering attacks. With the zero trust world coming, the passwordless approach has finally become publicly available with support from Google and Apple. Read the news post!

#security
🔥21
Architecture Weekly pinned «https://youtube.com/shorts/8hzTM0eB5bQ Please, leave a like and watch to the end - it helps to promote the content across YouTube!»
Architecture Weeekly #68 Highlights

This week I held the interview with Vitaly Sharovatov. We discussed the team dynamics, what managers can do to improve team performance and back it all up with scientific papers! Watch it here.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights

Datadog long-awaited postmortem 👷‍♂️
Datadog had a 24 hours long outage on March, 8th. Datadog being an observability company was kinda expected to publish the postmortem soon enough, but 2 months later there was nothing published. Some researchers ever tried to write their own version, but luckily the company decided to publish the PM themselves. Read a fascinating story on how Linux upgrades can get you down even if you're deployed to 3 different cloud providers for reliability.

#pm #reliability #upgrade

What Happens When You Type an URL Into Your Browser? 👷‍♂️
I remember several years ago I was going through an interview in Amazon. After the questions about the cloud advantages, the interviewer asked the questions in the title. And I think I managed to do pretty well: I described the 21h interruption, the events in the operating system, the DNS stuff including local caches, HTTP protocol... There was no second interview. So in case you get the same question - get the answer!

#systemdesign

How to run a Decision-Making Architecture Board 🍼
The autonomy of decisions in team is a good thing; however if the organization just allows everybody to do whatever they want, it soon will face a zoo of technologies and approaches. So at some point it makes sense to have a board where at least those decisions can be discussed. How to create and run one? Read a guest post in the blog of Olad Zommermann!

#adr #architectureboard
👍1
Architecture Weekly #68 Follow-up
Raft does not guarantee liveness in the face of Network Faults 🍼
Well, Raft as one of the consensus algorithms should guarantee the leader election during network faults. This post(but rather old one) showcased the 2 cases where the leader will not be able to be elected. The fixes suggested in the article as well, so take a closer look.

#distributedsystem #raft #consensus

Core Solution Architecture Methods 👷‍♂️
I am sharing an article from the Solution Architecture training. In this chapter the shared vision is considered: what you actually need to do in order to share the understanding of the system including defining boundaries, external interfaces, internal components etc. Get more details inside!

#architecture #documentation

Hotspot performance engineering fails 🍼
Some companies believe that software can be fast, if you find some hotspots in the code and optimize those. But as an architect, you can easily guess that enormous performance problems happen from inappropriate architecture. Daniel Lemire explains it in little more detail.

#performance #pareto

Postgres Superpowers in Practice 👷‍♂️
Postgres being a universal database for the majority of small and medium enterprises gets supported by the post of Oskar Dudycz, where he demonstrates how you can turn PostgreSQL into a multimodal database using the extensions. Look, how easy to convert it for example into a time-series db!

#db #postgres #timeseries

I built an AI Avatars Generator using Stable Diffusion 👷‍♂️
AI is on hype here. My colleague from Bolt wrote a blog post how he made his own AI Avatars Generator. He describes the request ingestion, cron jobs, model deployment and training and provides the architecture he used. Follow the post!
#ai #ml


Connecting Block Business Units with AWS API Gateway 🤟
Company acquisition or merging can be a tricky process from a technical perspective. Different ecosystems, programming languages, deployment and runtime approaches are among those complexities. However, Block(which is an owning company for Square and Cash App) does it almost on a regular basis. Find a very thorough post on how AWS API Gateway and Fargate help them integrate new companies into their infrastructure with minimal possible effort.

#integration #cloud #aws
2👍1