Architecture Weekly
2.99K subscribers
4 photos
725 links
Architecture Weekly newsletter originated at https://blog.vvsevolodovich.dev. ~10 articles or videos on solution architecture and system design every week!.
Download Telegram
Imagine your entire production environment going down or losing the database disks. What about a datacenter going off? What strategies should we plan to employ? How to execute the recovery efficiently? Figure out with Mikhail Druzhinin - an EM at DataDog with more than 500 hours of incident resolution time!

Broadcast

11.05.2023 18:00 GMT+3
πŸ‘3
This week I conducted an interview with Anton Malinskiy - an author of Marathon Test Runner and co-founder of Marathon Labs. We discussed Mobile Testing, the challenges behind it and how Marathon helps to resolve them. Watch the video!

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor Roman and Evgeniy for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Reducing cost by 90% by rewriting microservices to a monolith πŸ‘·β€β™‚οΈ
Yeah, you read it right! Prime Video - a video streaming product of Amazon with their own technical blog - dropped a piece which exploded over all the technical communities I am participating. Twitter, work slacks, telegram groups - all are referencing this article. So what's the hype? Prime Video has a service called Video Quality Analysis. It is supposed to identify any problems with video streaming and report it for further fix and investigation. The initial architecture leveraged Amazon Lambdas and Step Functions, but most importantly it was distributed by nature, which caused the usage of an S3 bucket for data sharing between the microservices. Apparently, it is very costly! So after some consideration, the team decided to move to a monolith. Find out the details of that story below, and remember, that on our YouTube channel, we kinda told you.

#refactoring #microservice #distributedsystem

Real-time Messaging at Slack πŸ‘·β€β™‚οΈ
Slack handles tens of millions of simultaneously connected clients every second and manages to deliver any message under 500 ms all over the world. They built a pretty sophisticated system consisting of Channel Servers, Edge Proxies, Gateway servers and Web Apps. They posted a good article with the explanations of those in the technical blog, grab the read!

#highload #architecture #casestudy

Secure Search Over Encrypted Data πŸ‘·β€β™‚οΈ
The common understanding is that once you encrypted the data, the only way you can do any operations over it, like modification or search, is possible only by decrypting the data first. It leads to a bunch of problems like key management, exposing the plane data to untrusty agents and others. But with the development of homomorphic encryption, you can at least search over encrypted data just find. Our friends from Cossack Labs share the article explaining the hustle.

#encryption #security
πŸ‘2πŸ”₯1
Architecture Weekly #66 - Follow Up
Deterministic Simulation: A New Era of Distributed Testing 🀟
Ensuring the correctness of distributed system is hard. Some people tend to use formal verification, while others seek to test all the possible cases. Both approaches are hard. However, deterministic simulation can be a combination of both tactics - and a very powerful one. Find an article on deterministic simulation engine and what it takes to simulate the distributed systems behavior.

#distributedsystem

Make Architecture Reviews like Peer Reviews 🍼
Architecture reviews, or committees to be more precise, have the bad reputation of slowing down initiatives with useless templates and discussions. While taking decisions in a silo with a high degree of autonomy is satisfying, it has a high probability of missing critical information that leads to costly reworks afterwards. So the question here is how to ensure the appropriate aligned architecture while not compromising on quality. Find out in the article below. To my taste, it is a bit of an overkill, but can work well even for a small org after an adoption.

#architecture #adr #documentation

Kubernetes Security Part 1 - Security Context πŸ‘·β€β™‚οΈ
Kubernetes runs a major part of the work payloads nowadays. And we need to run those securely. I am sharing a very deep detailed guide on adding security context to the container we run there alongside with scanning docker images, configuring network policies, implementing RBAC model and many more!

#security #kubernetes #k8s

The API. The Book πŸ‘·β€β™‚οΈ
My colleague - Sergey Konstantinov - wrote an online book on API-first development principles covering a vast spectrum of topics from authentication and authorization, API Design, Backward Compatibility and API as a product. Start reading while the additional parts of the second edition is being written now!

#api #apidesign

System Design Blue Print 🍼
I promise this is the last system design blueprint or an ultimate guide or you name it. But the folks asking for a consultation are always asking what should I at least be aware of to be ready for a systems design interview... Such articles help. However, they don't give you much detail - rather an overview.

#systemdesign

Agility and Architecture 🍼
The talk about how we combine the architecture work with the agile iterative approach is long and controversial. Somebody say, make the big upfront design, others insist of postponing all the decisions to the last possible moment. I am sharing a new article on InfoQ, which explores those takes and explains for example that there are no such "last possible moments" in software development and you rather have some Minimal Viable Architecture, which you can iterate on.

#agile

Next Thursday, I am discussing Disaster Recovery with Misha Druzhinin. Join the live stream!
πŸ‘5
Architecture Weekly pinned Β«Modules in Build Systems https://youtube.com/shorts/l2gkhLUtfKI?feature=shareΒ»
Architecture Weekly pinned Β«Live in an hour!Β»
Architecture Weekly #67

This week we handled a discussion on Disaster Recovery with Misha Druzhining. And you won't believe what happened in the middle of the broadcast.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Database Sharding Explained 🀟
Sharding is an important concept to ensure the reliability and performance of the overall system. You can do that in a variety of ways, which of them can cause it's own problems. Architecture Notes blog has a free post explaining in deep details what the sharding is in a nutshell.

#db #sharding

How Tinder built its own API Gateway πŸ‘·β€β™‚οΈ
Tinder tried multiple solutions for API Gateway, including AWS API Gateway, Apigee, Kong and others. But in the end, they decided they really needed a bespoke solution to match their requirements of scalable, reusable and configuration-based demands. So they took Spring Cloud Gateway and built their solution on top. Find what they managed to achieve in the article in Tinder Tech Blog.

#api #apigateway #architecture #casestudy

Migrating Critical Traffic at Scale with No Downtime - Part 1 πŸ‘·β€β™‚οΈ
Bringing new infrastructure to the production load is always a little risky. For Netflix which wants to ensure an uninterrupted watching experience this is a critical technical capability. In the latest blog post, they explain that real traffic replay plays a crucial role in testing new services and they built a special solution including a replay server. Follow the article for the details!

#sre #casestudy
πŸ‘1
Architectue Weekly #67 Follow-Up

Software Architecture Canvas 🍼
I am a big proponent of Solution Architecture Documents, RFCs and ADRs. But it's always good to take a fresh look. Patrick Roos shared a new format to allow the collaborative effort to architecture: the Canvas. I especially like the strong demand for the business case(top of green) and the risks and challenges(in blue). Give it a try!

#documentation

The Inner Workings of Distributed Databases 🀟
Alex Pelagenko begins an article with a nice analogy: he gets to the office by the bike, but if it failes - should there be a replacement? Same happens with the databases: if the first node fails, the should be a standby. But should the replication by sync or async? Should it be a master-master replication? Alex considers several databases and demonstrates the sequence diagrams how they handle disconnection issues.

#db #timeseries

Building a large scale unsupervised model anomaly detection system 🀟
Lyft leverages tons of ML models to define a wide range of parameters from ETAs to pricing. But they also need to understand if those model perform well. The problem is that different model different number of features and outputs. So they need to unify and process them efficiently. Find how they do it in the blog post!

#ml

2023 State of Platform Engineering Report 🍼
The word DevOps is mentioned less frequently while people speak more and more about Platform Engineering. Perforce is publishing it's report on Platform Engineering, and among many valuable insights, you will find the statement about companies underinvesting in the product managers for the platforms - because it's still a product, even for your internal developers. Find the report download below, and while you're going through it, turn the discussion about developer relations with Baruch Sadogursky here.

#devops #platformengineering

Passwords are no more 🍼
Passwords have a long history of problems like being easy to brute force, phished and prone to social engineering attacks. With the zero trust world coming, the passwordless approach has finally become publicly available with support from Google and Apple. Read the news post!

#security
πŸ”₯2❀1
Architecture Weekly pinned Β«https://youtube.com/shorts/8hzTM0eB5bQ Please, leave a like and watch to the end - it helps to promote the content across YouTube!Β»
Architecture Weeekly #68 Highlights

This week I held the interview with Vitaly Sharovatov. We discussed the team dynamics, what managers can do to improve team performance and back it all up with scientific papers! Watch it here.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights

Datadog long-awaited postmortem πŸ‘·β€β™‚οΈ
Datadog had a 24 hours long outage on March, 8th. Datadog being an observability company was kinda expected to publish the postmortem soon enough, but 2 months later there was nothing published. Some researchers ever tried to write their own version, but luckily the company decided to publish the PM themselves. Read a fascinating story on how Linux upgrades can get you down even if you're deployed to 3 different cloud providers for reliability.

#pm #reliability #upgrade

What Happens When You Type an URL Into Your Browser? πŸ‘·β€β™‚οΈ
I remember several years ago I was going through an interview in Amazon. After the questions about the cloud advantages, the interviewer asked the questions in the title. And I think I managed to do pretty well: I described the 21h interruption, the events in the operating system, the DNS stuff including local caches, HTTP protocol... There was no second interview. So in case you get the same question - get the answer!

#systemdesign

How to run a Decision-Making Architecture Board 🍼
The autonomy of decisions in team is a good thing; however if the organization just allows everybody to do whatever they want, it soon will face a zoo of technologies and approaches. So at some point it makes sense to have a board where at least those decisions can be discussed. How to create and run one? Read a guest post in the blog of Olad Zommermann!

#adr #architectureboard
πŸ‘1
Architecture Weekly #68 Follow-up
Raft does not guarantee liveness in the face of Network Faults 🍼
Well, Raft as one of the consensus algorithms should guarantee the leader election during network faults. This post(but rather old one) showcased the 2 cases where the leader will not be able to be elected. The fixes suggested in the article as well, so take a closer look.

#distributedsystem #raft #consensus

Core Solution Architecture Methods πŸ‘·β€β™‚οΈ
I am sharing an article from the Solution Architecture training. In this chapter the shared vision is considered: what you actually need to do in order to share the understanding of the system including defining boundaries, external interfaces, internal components etc. Get more details inside!

#architecture #documentation

Hotspot performance engineering fails 🍼
Some companies believe that software can be fast, if you find some hotspots in the code and optimize those. But as an architect, you can easily guess that enormous performance problems happen from inappropriate architecture. Daniel Lemire explains it in little more detail.

#performance #pareto

Postgres Superpowers in Practice πŸ‘·β€β™‚οΈ
Postgres being a universal database for the majority of small and medium enterprises gets supported by the post of Oskar Dudycz, where he demonstrates how you can turn PostgreSQL into a multimodal database using the extensions. Look, how easy to convert it for example into a time-series db!

#db #postgres #timeseries

I built an AI Avatars Generator using Stable Diffusion πŸ‘·β€β™‚οΈ
AI is on hype here. My colleague from Bolt wrote a blog post how he made his own AI Avatars Generator. He describes the request ingestion, cron jobs, model deployment and training and provides the architecture he used. Follow the post!
#ai #ml


Connecting Block Business Units with AWS API Gateway 🀟
Company acquisition or merging can be a tricky process from a technical perspective. Different ecosystems, programming languages, deployment and runtime approaches are among those complexities. However, Block(which is an owning company for Square and Cash App) does it almost on a regular basis. Find a very thorough post on how AWS API Gateway and Fargate help them integrate new companies into their infrastructure with minimal possible effort.

#integration #cloud #aws
❀2πŸ‘1
Architecture Weekly #69

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights

Understanding database indexes in PostgreSQL 🀟
You have definitely heard about indexes in databases - they make our queries fast. But how exactly? What different types of indexes are out there? What are the ones supported by your database engine? Find out a detailed piece on indexes in PostrgreSQL with nice illustrations insde.

#db #index #performanse

Container Loading in AWS Lambda 🀟
This week a new paper has been published on the long-awaited container support for AWS Lambda. In this article Marc Brooker dissect this paper and explains the biggest problem with supporting containers: performance. One of the tactics for performance is caching, but in case of containers it can become tricky. Follow the article to know about how much unique bytes a typical container has πŸ™‚

#paper #aws #serverless

The State of Frontend 2022 🍼
It's always interesting to see the surveys on the parts of the IT industry. This time I am happy to share the State of Frontend Report which covers tons of topics from developer's work conditions to frameworks, static-site generators, hosting, micro-frontends and the future of Frontend itself. Nicely built report inside.

#frontend #report
πŸ”₯1
Follow-Up

Apache Flink Architecture 🀟
Apache Flink is a robust stream processing framework used at Airbnb or Uber. Aurimas GriciΕ«nas posted a long read on the internal structure of Apache Flink so that you can know what is a Flink Program, what's the difference between JobManager and TaskManager and how to get it all going in HA mode.

#bigdata

How to manage your technical backlog πŸ‘·β€β™‚οΈ
Technical debt is probably the second thing a developer knows after "Hello, world!" as starts right at the same moment. Despite having millions of articles and books on refactoring, the architecture or say manager perspective is rarely presented. Fixing it now with the article by DarΓ­o RodrΔ«guez on managing technical debt including cost-benefit analysis and other tactics.

#quality #process

Scaling Salt for Remote Execution to support LinkedIn Infra Growth πŸ‘·β€β™‚οΈ
LinkedIn is changing infrastructure using the opensource Salt engine. But there are different ways how you can run it. At some point they realized their current setup didn't scale well, so they decided to change the architecture slightly to unleash the full power of the instrument. Grab a case study inside!

#casestudy #architecture

How to Architect Android Apps πŸ‘·β€β™‚οΈ
I remember the times when there was no such thing as a Mobile Application Architecture - you would just put some code right into your UI Controller, make a couple of swear words on the network operations on main thread and you're done. Modern mobile applications are enormous with millions of LOCs, and you need a good architecture for them. Grab an article with Android application architecture, principles and a comparison of modern Google approach with good old Clean Architecture.

#architecture #mobile

ChatGPT already knows 🍼
Uwe Friedrechsen dropped another series on the so discussed ChatGPT and if the chatbot will replace professional programmers. We really do not want it to happen(and I even recorded an episode on the topic), but Uwe is not that optimistic. And the simple reason is that you learn at a very limited speed... and ChatGPT learns instantly. Think about it!

#phylosophy

CNCF Platforms White Paper 🍼
And another time - about platform engineering. Cloud Native Computing Foundation published a whitepaper on the platforms, explaining why they are important in today's computing world, what exactly we should call a platform and what are the biggest challenges of them.

#platform #cloud
πŸ”₯2
Architecture Weekly #70

Evolutionary Architecture from an Organizational Perspective 🍼
Evolutionary Architecture frequently only considered from the technical point of view, forgetting about the the fact that without the organization support evolving architecture is barely feasible task. The article on InfoQ blog discusses the necessity of IT and Business working as one team, alignment on company-wide goals and the importance of contracts within the organization for achieving truly evolutionary architecture.

#architecture

An educational side project πŸ‘·β€β™‚οΈ
Have you tried building a product like Uber or Bolt on your own? Juraj Majerik decided to try! Across the span of 7 months he configured the infrastructure, built the backend, frontend and the load emulator and created the demo for the system, learning a lot meanwhile. Gergely Orosz describes his work and the main takeaways from such experience alongside with the system architecture.

#casestudy

Frugal Software Architecture 🍼
This week I wrote an article myself. I want to introduce you to the concept of Frugal Software Architecture. But it's not a mere cost efficiency metric of the software, but rather a set of principles allowing to balance strategic investments and cost optimizations. Find them inside.

#architecture #frugality
πŸ‘2πŸ”₯1🀩1