Architecture Weekly
2.99K subscribers
4 photos
725 links
Architecture Weekly newsletter originated at https://blog.vvsevolodovich.dev. ~10 articles or videos on solution architecture and system design every week!.
Download Telegram
Architecture Weekly #64 Highlights

This week I conducted an interview with Baruch Sadogursky (Principal Developer Advocate at Gradle). We discussed Developer Advocacy, Development Relations and why people are confused with DevOps and DevRels roles. Watch it here!

The end of a myth: Distributed transactions can scale 🤟
I learned recently about RDMA - remote direct memory access, which allows bypassing the CPU while transferring data from one machine to another. And this development actually enables the performant distributed transactions. A paper that depicts the system design for it is being reviewed by Murat. Grab a hard, but insightful read!

#database #distributed #paper

Let's (not) break up the monolith 👷‍♂️
This week I published a short made out of the interview with Andrey Rebrov about microservices. The short was about should you really necessarily rewrite the monolith though(Damn, watch it already!). Following up the thought, Uwe accidentally started a series of articles on what people really want when they say they would like to break up the monolith into microservices. It appears, they don't want to rewrite it necessarily, but rather have more frequent deployment and more predictable development. Well, you can do that with monolith as well!

#architecture #monolith #microservice

16 System Design Concepts You wish you knew before 👷‍♂️
Another short guide to help you prepare to the system design interview. This is not like a super-full list but it will do as an introductory checklist.

#systemdesign #interview
🔥1
Introduction to Data Modeling 🍼
We live in a time when it's essential to fast, so everybody is doing MVPs, and once they go live, the engineers never get a chance to rewrite a product, because you know, business needs revenue. The same happens with data: you never get a chance to model the data properly. But if you do, you need to know why it's crucial for your data engineering pipelines and what's there for it. Read a long read by Airbyte.

#data

Zero-ETL, ChatGPT and the Future of Data Engineering 👷‍♂️
Everybody heard about zero code, but the end of the joke is "zero code - zero jobs". This time we are hearing about Zero-ETL, which was introduced originally by Amazon. So the idea is that you will have only some code doing the job, and the pipeline will be provided for you. But is it really feasible? What are the nuances of such a stack? Some considerations inside.

#data #etl

How to build next-gen serverless applications 🍼
Serverless does indeed have servers in it, but you might not care, because it's a managed service. However, you still need to think about structuring your application, the communication between the apps and the resilience. Grab a short article on some of the considerations on serverless apps.

How to write RFCs and ADRs 🍼
And back to our favorite topic - RFCs and ADRs. I have shared several articles in the previous issues of the newsletter about why we need them in the first place. This time I found a good article on how precisely write them so that they are clear, concise and meaningful. Take a note!

#documentation #rfc #adr #howto

Architecting disaster recovery for cloud infrastructure outages 👷‍♂️
Cloud is still running on the physicals servers in the data centers. And there are electricity outages, floods, connectivity losses and other disasters. The Cloud systems should tackle this kind of problems still, that's why they put the disaster recovery plans. Read how they do it in Google.

#cloud #gcp #disasterrecovery

Lab Basics 👷‍♂️
And a little fun. Imagine you're experimenting with some IoT devices or having some working for your SmartHome. The good idea would be to organize a proper network setup to isolate one group of the devices from the another. Ken Moini dropped a nice beginning of the series, where he explains how to runs his own computer lab at home. Grab a read.

#network #casestudy #iot

Like the newsletter? Wanna receive new content earlier, than everybody else? Consider helping to run it at Patreon or Boosty. The funds go to pay for the hosting and some software like a Camo Studio license. Patrons and Boosty subscribers of a certain level also get access to a private Architecture Community and of course every supporter gets early access. Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman and Egor for already supporting the newsletter. Join them as well!
2👍2
Architecture Weekly History Digest

This time I am sharing you not the new links, but instead some articles and videos from the most liked issues of the newsletter. Give them a thorough read , if you missed them or just remind yourself about their content ;)

How Discord Stores Billions of Messages
👷‍♂️
Amazing post from the CTO of Discord on storing the messages. You will find a business problem statement, inherited situation, articulated requirements and a detailed solution in the article. A true example of an architectural blog post. Deserves a first line in the newsletter issue!


SQLite Internals: Pages & B-trees 🤟
SQLite is a small database famous for it's reliability, small size, usage in the mobile applications and the fact that the LOCs of tests for it is greater that the lines of actual DB code. In this article you will find how SQLite encodes the rows, why we need pages and how b-trees allow to find data in log(N) time.

#database

Architecture for Flow with Wardley Mapping, DDD, and Team Topologies 👷‍♂️
With this talk I learned about Wardley Map - a tool to create the strategic design of an application or a service. Having introduced that, Susanne Kaiser shows how to apply it to online learning domain, add a bit of Domain-Driven Design, a grain of Team Topologies to come up with a well-though solution.

#ddd #devops #strategy #video

Reducing Logging Cost by Two Orders of Magnitude using CLP 🤟
The more your business grows the more you need to understand what happens within your system. With Uber and tons of analytical data and therefore log data generated each day, it became an issue. They need to retain a sufficient amount of logs to understand what happens within their Spark jobs and not pay for it too much. So they used a Compressed Log Processor, which was able to reduce the number of writes to SSD and store the logs in a searchable manner but occupying two orders of magnitude less space. More details inside!

#logging #costoptimization #bigdata

Zero Downtime Migration from HBase to Bigtable 👷‍♂️
Box - a cloud file storage and management system - used to store more than 80TB of user data in on-premise HBase. Last year they decided to migrate it all to fully managed HBase compatible BigTable. The requirement was to make to with zero-downtime, which could be challenging. Read about the PoC they did to choose the storage type, and how they ensured the zero downtime in the post by Axatha Jalimarada.

#migrations #cloud #bigtable

How to choose a database?

Just reminding you that I posted a video of a heading "Architecture Reading". I will go through the books and cover their content. Checkout and leave feedback!

https://youtu.be/M0u3btcLvZo

#video

Aligning organization and architecture with strategic DDD 🍼
Find a deck by Michael Plöd on the application of DDD to organizing teams. As to Convey's law, your architecture will resemble the organization, picking the proper team helps a lot. Pick up the DDD approach, identify contexts, make good boundaries and remember, nothing is perfect. The slide deck below.

#ddd

The security design of the AWS Nitro System 👷‍♂️
The software manages a lot of PII(Personal Identification Information), PHI(Personal Health Information) and PCI(Personal Commercial Information). Processing this data requires a high level of protection not only at the application but at the VM and hardware as well. AWS published a big material on how Nitro System works and how they provide a secure virtualization platform.

#security #aws

Design Considerations fpr Platform Engineering Teams
🍼
Following up the topic of Platform Teams, please find an article on the teams that might together represent the platform. Platform User Interface, Infrastructure, Services, Support and others inside.

#platform
#bestpractices #teams
👍51
Через 4 минуты будут проводить System Design интервью)
We exceeded 1024 subscribers! Thanks everyone for the following 🙂 If you like the work I am doing, feel free to support here or here 🙂 Also, I got a couple of awesome people scheduled for the interviews the following weeks, so stay tuned for the updates!
🔥5👍2
Architecture Weekly #65 - Highlights

Tech Radar by Thoughtworks 👷‍♂️
The new issue of the Tech Radar by Thoughtworks. This time they highlight the rise of AI Tools for Software Development, marking GitHub Copilot for "Assess", highlighting the ease of adding accessibility to web and mobile applications and warning about the proper use of Lambda functions. See more details there!

#techradar

IaaS pricing patterns and Trends 2022 👷‍♂️
A major part of choosing a platform to host the solution is the cost. That's why it is useful to understand the cloud vendor's landscape regarding the price. Here you can find a report on pricing patterns and trends with data gathered at the end of 2022. Unfortunately, the vendors make it impossible to compare the VMs offerings, but the report tries to do it best. Looks like VMs are the most expensive at Amazon, but the disks there are the cheapest... but there is a lot of parameters inside.

#cloud #cost

The Fractal Geometry of Software Design 🍼
Vlad Khononov is one of my favourite authors on DDD. This time I am sharing his talk about energy transfer systems, which software design can relate to. And as such systems happen to be fractals, software design has the same patters. Sounds intriguing? Watch the video!
2🔥1
Architecture Weekly #65 - Follow Up

Intro to AWS Well-Architected Framework 🍼
If the organization is new to the cloud of need a substantial architecture change, AWS can support it with it's Well-Architected Framework: the collection of articles and papers on how to do things right, including Operational Excellence, Security, Performance and other important aspects called pillars. See more details in the article by "A Cloud Guru".

#aws #cloud #architecture

A brief history of high availability 👷‍♂️
What does high availability mean in particular? What's the difference between fault tolerance and high availability? How we came up with multi-active availability for databases? The answers to those questions you will find in the article in the Cockroach Labs Blog.

#db #availability #replication

How RocksDB works 🤟
Previously we already considered how some of the database engines could work, for example how B-trees can be leveraged for reading and writing data. I stumbled upon the article by Artem Krylysov where he explains how the RocksDB work - an embeddable database that powers Yugabyte, TitaniumDB and others - which happen to use LSM - a log-structured merge tree. Follow the article for the details!

#db #lsm

Cross-shard transactions at 10 million requests per second
Two-phase commits is a long-known technique for distributed transactions. Dropbox blog contains an article with the story of the implementation of this easy-on-paper protocol within a cluster of thousands of MySQL databases handling petabytes of metadata to support user-facing features. Follow the fascinating journey!

#databases #2pc

Replacing RabbitMQ with the Postgres Queue 👷‍♂️
Usually, when you need some queue processing you switch from a database implementation to something like Kafka. But Prequel did the opposite: having RabbitMQ to manage task queues they decided to switch to a PostgreSQL table because it is impossible to disable message prefetching in Rabbit. Have more details inside!

Grokking Scalability for System Design 👷‍♂️
And another primer on system design! This time on scalability. Find out the 2 types of scalability, the tactics for it and overall approaches in the Grokking Scalability article.

Like the newsletter? Wanna receive new content earlier, than everybody else? Consider helping to run it at Patreon or Boosty. The funds go to pay for the hosting and some software like a Camo Studio license. Patrons and Boosty subscribers of a certain level also get access to a private Architecture Community and of course every supporter gets early access. Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman and Egor for already supporting the newsletter. Join them as well!
👍7
Architecture Weekly pinned «This Wednesday we are talking about automated mobile testing with Marathon test runner author! Book your calendars :) https://youtube.com/live/S-4XbJpwaaI?feature=share»
Imagine your entire production environment going down or losing the database disks. What about a datacenter going off? What strategies should we plan to employ? How to execute the recovery efficiently? Figure out with Mikhail Druzhinin - an EM at DataDog with more than 500 hours of incident resolution time!

Broadcast

11.05.2023 18:00 GMT+3
👍3
This week I conducted an interview with Anton Malinskiy - an author of Marathon Test Runner and co-founder of Marathon Labs. We discussed Mobile Testing, the challenges behind it and how Marathon helps to resolve them. Watch the video!

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor Roman and Evgeniy for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Reducing cost by 90% by rewriting microservices to a monolith 👷‍♂️
Yeah, you read it right! Prime Video - a video streaming product of Amazon with their own technical blog - dropped a piece which exploded over all the technical communities I am participating. Twitter, work slacks, telegram groups - all are referencing this article. So what's the hype? Prime Video has a service called Video Quality Analysis. It is supposed to identify any problems with video streaming and report it for further fix and investigation. The initial architecture leveraged Amazon Lambdas and Step Functions, but most importantly it was distributed by nature, which caused the usage of an S3 bucket for data sharing between the microservices. Apparently, it is very costly! So after some consideration, the team decided to move to a monolith. Find out the details of that story below, and remember, that on our YouTube channel, we kinda told you.

#refactoring #microservice #distributedsystem

Real-time Messaging at Slack 👷‍♂️
Slack handles tens of millions of simultaneously connected clients every second and manages to deliver any message under 500 ms all over the world. They built a pretty sophisticated system consisting of Channel Servers, Edge Proxies, Gateway servers and Web Apps. They posted a good article with the explanations of those in the technical blog, grab the read!

#highload #architecture #casestudy

Secure Search Over Encrypted Data 👷‍♂️
The common understanding is that once you encrypted the data, the only way you can do any operations over it, like modification or search, is possible only by decrypting the data first. It leads to a bunch of problems like key management, exposing the plane data to untrusty agents and others. But with the development of homomorphic encryption, you can at least search over encrypted data just find. Our friends from Cossack Labs share the article explaining the hustle.

#encryption #security
👍2🔥1
Architecture Weekly #66 - Follow Up
Deterministic Simulation: A New Era of Distributed Testing 🤟
Ensuring the correctness of distributed system is hard. Some people tend to use formal verification, while others seek to test all the possible cases. Both approaches are hard. However, deterministic simulation can be a combination of both tactics - and a very powerful one. Find an article on deterministic simulation engine and what it takes to simulate the distributed systems behavior.

#distributedsystem

Make Architecture Reviews like Peer Reviews 🍼
Architecture reviews, or committees to be more precise, have the bad reputation of slowing down initiatives with useless templates and discussions. While taking decisions in a silo with a high degree of autonomy is satisfying, it has a high probability of missing critical information that leads to costly reworks afterwards. So the question here is how to ensure the appropriate aligned architecture while not compromising on quality. Find out in the article below. To my taste, it is a bit of an overkill, but can work well even for a small org after an adoption.

#architecture #adr #documentation

Kubernetes Security Part 1 - Security Context 👷‍♂️
Kubernetes runs a major part of the work payloads nowadays. And we need to run those securely. I am sharing a very deep detailed guide on adding security context to the container we run there alongside with scanning docker images, configuring network policies, implementing RBAC model and many more!

#security #kubernetes #k8s

The API. The Book 👷‍♂️
My colleague - Sergey Konstantinov - wrote an online book on API-first development principles covering a vast spectrum of topics from authentication and authorization, API Design, Backward Compatibility and API as a product. Start reading while the additional parts of the second edition is being written now!

#api #apidesign

System Design Blue Print 🍼
I promise this is the last system design blueprint or an ultimate guide or you name it. But the folks asking for a consultation are always asking what should I at least be aware of to be ready for a systems design interview... Such articles help. However, they don't give you much detail - rather an overview.

#systemdesign

Agility and Architecture 🍼
The talk about how we combine the architecture work with the agile iterative approach is long and controversial. Somebody say, make the big upfront design, others insist of postponing all the decisions to the last possible moment. I am sharing a new article on InfoQ, which explores those takes and explains for example that there are no such "last possible moments" in software development and you rather have some Minimal Viable Architecture, which you can iterate on.

#agile

Next Thursday, I am discussing Disaster Recovery with Misha Druzhinin. Join the live stream!
👍5
Architecture Weekly pinned «Modules in Build Systems https://youtube.com/shorts/l2gkhLUtfKI?feature=share»
Architecture Weekly pinned «Live in an hour!»
Architecture Weekly #67

This week we handled a discussion on Disaster Recovery with Misha Druzhining. And you won't believe what happened in the middle of the broadcast.

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy and Nadia for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Highlights
Database Sharding Explained 🤟
Sharding is an important concept to ensure the reliability and performance of the overall system. You can do that in a variety of ways, which of them can cause it's own problems. Architecture Notes blog has a free post explaining in deep details what the sharding is in a nutshell.

#db #sharding

How Tinder built its own API Gateway 👷‍♂️
Tinder tried multiple solutions for API Gateway, including AWS API Gateway, Apigee, Kong and others. But in the end, they decided they really needed a bespoke solution to match their requirements of scalable, reusable and configuration-based demands. So they took Spring Cloud Gateway and built their solution on top. Find what they managed to achieve in the article in Tinder Tech Blog.

#api #apigateway #architecture #casestudy

Migrating Critical Traffic at Scale with No Downtime - Part 1 👷‍♂️
Bringing new infrastructure to the production load is always a little risky. For Netflix which wants to ensure an uninterrupted watching experience this is a critical technical capability. In the latest blog post, they explain that real traffic replay plays a crucial role in testing new services and they built a special solution including a replay server. Follow the article for the details!

#sre #casestudy
👍1
Architectue Weekly #67 Follow-Up

Software Architecture Canvas 🍼
I am a big proponent of Solution Architecture Documents, RFCs and ADRs. But it's always good to take a fresh look. Patrick Roos shared a new format to allow the collaborative effort to architecture: the Canvas. I especially like the strong demand for the business case(top of green) and the risks and challenges(in blue). Give it a try!

#documentation

The Inner Workings of Distributed Databases 🤟
Alex Pelagenko begins an article with a nice analogy: he gets to the office by the bike, but if it failes - should there be a replacement? Same happens with the databases: if the first node fails, the should be a standby. But should the replication by sync or async? Should it be a master-master replication? Alex considers several databases and demonstrates the sequence diagrams how they handle disconnection issues.

#db #timeseries

Building a large scale unsupervised model anomaly detection system 🤟
Lyft leverages tons of ML models to define a wide range of parameters from ETAs to pricing. But they also need to understand if those model perform well. The problem is that different model different number of features and outputs. So they need to unify and process them efficiently. Find how they do it in the blog post!

#ml

2023 State of Platform Engineering Report 🍼
The word DevOps is mentioned less frequently while people speak more and more about Platform Engineering. Perforce is publishing it's report on Platform Engineering, and among many valuable insights, you will find the statement about companies underinvesting in the product managers for the platforms - because it's still a product, even for your internal developers. Find the report download below, and while you're going through it, turn the discussion about developer relations with Baruch Sadogursky here.

#devops #platformengineering

Passwords are no more 🍼
Passwords have a long history of problems like being easy to brute force, phished and prone to social engineering attacks. With the zero trust world coming, the passwordless approach has finally become publicly available with support from Google and Apple. Read the news post!

#security
🔥21