Architecture Weekly
2.99K subscribers
4 photos
725 links
Architecture Weekly newsletter originated at https://blog.vvsevolodovich.dev. ~10 articles or videos on solution architecture and system design every week!.
Download Telegram
I am holding a public Mock System Design interview with Igor Bondarenko. The task is secret, but it's gonna be fun. Make sure to join the stream and ask tricky questions! At the end I will give my recommendations and decide at what level the interview was passed.

Friday, June 9th, 18.00 GMT+3

https://youtube.com/live/zUi_5eSv0wY?feature=share
πŸ‘1
1000 subscribers! Thank you very much ;)
πŸ”₯20πŸŽ‰2πŸ™2🀩1🐳1
Architecture Weekly #71 - Highlights

The State of Data Engineering 2023 🍼
Data Engineering is a huge domain. That's why state reports are so valuable! Please find one here: Ingestion tools, Data Lakes, Metadata management, Analytics management and many more!

#dataengineering

Intro to Data Engineering 🍼
Speaking about Data Engineering: if the words above do not tell you much, please follow the conversation I had with Pasha Finkelshteyn, a Developer Advocate at JetBrains. He explains what data engineering is all about, what are the biggest problems there and much more!

#dataengineering #video

Building Efficient Experimentation Environments for ML Projects πŸ‘·β€β™‚οΈ
Looks like the 71st Highlights are all about Data Engineering πŸ™‚ SwirlAI newsletter gives us a post on organizing the Experimentation Environment for Data Scientists to improve the ML models. The article looks into the properties that an efficient Experimentation Environments should have. And, as a MLOps engineer, you should strive to provide these to your users and as a Data Scientist, you should know what you should be demanding for.

#ml #dataengineering
πŸ‘2🀩1
Architecture Weekly #71 - Follow-Up

Contract Testing Case Study πŸ‘·β€β™‚οΈ
We covered Contract Testing several times. This time I would like to share the article by Ebay Engineering. The folks wanted to ensure the API provided by the Notification team was functioning well for all of it's consumers. They considered the BDD, but found some caveats and pivoted to the Contract Testing. Find out what they managed to achieve!

#api #contracttesting

Six Ways to shoot yourself in the foot with health checks πŸ‘·β€β™‚οΈ
Health checks seem to be pretty simple: check if the application responds in time and connects to a database and a Kafka topic... But it's just first impression. You can easily break your application with small health check tweaks. Find out 6 ways to do that!

#devops #observability

Microsoft Azure Well-Architected Framework 🍼
We covered the Amazon Well-Architectured framework previously. But obviously, they are not the only cloud provider there. It's time for the Azure Well-Architected Framework, which focuses on Reliability, Security, Cost Optimization, Operational Excellence and Performance Efficiency.

#cloud #architecture #bestpractice

The Full Circle on Developer Productivity 🍼
Steve Yegge used to work at Amazon, Google, Grab and now working at Sourcegraph. He shares his long journey and discusses why the developer tooling is so important and why it's so hard to make it right. I couldn't stop reading!

#philosophy

Securing the API access πŸ‘·β€β™‚οΈ
When you expose the API you need to protect it from the malicious access. The first idea is to use some type of a token, but it comes with the problems of invalidation, limiting the scope and many more. OAuth tokens combined with JWT yields more efficient solution without comprising the security and even solving the mentioned problems. Follow the Zapier's blog for further info!

#api #security

Mock System Design Interview: Video Portal πŸ‘·β€β™‚οΈ
Another bit of content from my YouTube Channel: mock system design interview. Here we are architecting a simple video portal, which is capable of uploading the videos, converting it to different qualities and stream it to mobile and web clients. Follow the video for the designing process!

#video #mockinterview #systemdesign
πŸ‘4❀1
Frugal Software Architecture. Part 2: Strategic Investments

https://vvsevolodovich.dev/frugal-software-architecture-part-2-strategic-investments/
πŸ‘1🀩1
Architecture Weekly #72 - Highlights

Big thanks to Nikita, Anatoly, Oleksandr, Dima, Pavel B, Pavel, Robert, Roman, Iyri, Andrey, Lidia, Vladimir, August, Roman, Egor, Roman, Evgeniy, Nadia and Daria for supporting the newsletter. They receive early access to the articles, influence the content and participate in the closed group where we discuss the architecture problems. They also see my daily updates on all the things I am working on. Join them at Patreon or Boosty!

Relational Databases Explained 🀟
Recently I interviewed a pretty Senior guy who was doing very well but lacked the understanding of indexes and how to scale relational databases. Fortunately, Architecture Notes came up with a new long post about RDBMS, including the indexes and B-Trees explained. BTW, I have a video on B-Trees as well πŸ™‚

#db

Creating a Multi-Region Application with AWS Services πŸ‘·β€β™‚οΈ
This week the US-EAST-1 region of AWS was affected by an outage caused by AWS Lambda capacity management. There is no postmortem yet, but it is worth returning to the roots of multi-region application architecture. So grab an article from AWS themselves!

#cloud #aws #availability

Frugal Software Architecture. Part 2: Strategic Investments 🍼
The very first principle of frugal software architecture is strategic investments. While the name might imply a purely financial perspective, it extends further into time, technology, and talent. Grab the second article from the series!

#frugality #architecture
πŸ‘3
Architecture Weekly #72 - Follow-Up

Good availability measure 🀟
Following up on the availability topic. How do you measure the availability? Surprisingly, every cloud provider uses its own approach. In this paper, you will find what makes a good availability metric, why the current ones are not fulfilling those requirements and find a proposal for a true metric.

#availability #paper

Architecture Principles 🍼
Architecture Principles can guide the major decisions in your system like picking up a technology for Identity Management or picking up a communication protocol. Still the question is what makes a good architecture principle? How to formulate one? Find those rules covered in the article alongside with good examples.

#architecture

Intro to Kubernetes with Carlos Sanchez 🍼
Another interview! I spoke with the Principal Scientist at Adobe Experience Manager and the author of Jenkins Kubernetes Plugin - Carlos Sanchez! We figured out what is Kubernetes, what are the basic building blocks of it and what benefit it brings to managing your payload.

#kubernetes #k8s #interview #video

Database Partition Conversion with minimum downtime 🀟
Partitioning the database is a known practice of handling the growing amount of data. However it is important to also know how to create a new partition while under the heavy load: cause what other load led you to a db requiring a partition? Find the great piece of migrating to a new partition for a PostgreSQL.

#db #casestudy

PayPals' Key-Value Store open-sourced πŸ‘·β€β™‚οΈ

Some of the news here! PayPal open-sourced JunoDB, a distributed key-value store that uses RocksDB as the underlying storage engine. PayPal is serving 350 billion requests which requires a highly available and security-focused database. Find out how it can be achieved!

#db

Introduction to encryption 🀟
And the last one for today - the introduction to encryption. I decided that it is always good to get back to the fundamentals - and I stumbled upon a good article. So grab the read!
❀1πŸ”₯1
Architecture Weekly #73 - Highlights

Architecture is like Stock Market. Selling Options 🍼
Options in financial work are the possibility to purchase or sell a stock for a predefined price. Gregor Hohpe argues that Postponing an Architecture Decisions resembles options to a certain degree: it has a price, but allows to defer the decisions of buying into a particular thing(platform, pattern, etc.) later. Grab a short note in the Architecture Elevator.

#architecture #philosophy

Basecamp moving out from the cloud 🍼
37signals is a pretty famous company with the products like Basecamp and HEY. DHH - David Henemeier Hansson wrote a blog post that they completed their migration from AWS to their own hardware in 2 datacenters. It took them just 6 months to migrate but will save 1.5 million dollars yearly. Check out the thoughts on the migration!

#cloud #migration

Partitioning and replication: benefits & challenges 🀟
In distributed systems, partitioning involves dividing data into smaller units assigned to specific machines, aiding in scalability, performance, and fault-tolerance. Replication, on the other hand, duplicates data across different machines for increased fault tolerance. Despite their benefits, challenges exist. Replication requires consistent updates across replicas, while partitioning involves deciding optimal data division and handling multi-item requests. Many systems combine both techniques to maximize benefits while managing the associated challenges.

#database #replication #distributed
πŸ‘4❀2πŸ”₯1πŸŽ‰1🀩1
Architecture Weekly #73 - Follow-Up

A Begginer's guide to database deadlock
🀟
This article explains how deadlocks occur in relational database systems and how these systems, such as Oracle, SQL Server, PostgreSQL, or MySQL, recover from such situations. Deadlocks happen when two concurrent transactions can't proceed because each is waiting for the other to release a lock. A separate process in the database engine detects such cycles and resolves the deadlock by aborting one transaction, thereby releasing its locks. The decision on which transaction to abort can vary based on the system, with some considering rollback cost or deadlock priority. The article underscores the importance of understanding and managing deadlocks to handle unexpected transaction rollbacks.

#database #deadlock

SQS vs Kinses vs Eventbridge 🍼
This article discusses when to use AWS messaging services SQS, SNS, EventBridge, and Kinesis. SQS is ideal for 1:1 communication, acting as a buffer and ensuring ordered message processing. SNS is used for broadcasting messages to multiple consumers, while EventBridge provides broadcasting, event scheduling, and SaaS integration. Kinesis excels in processing large volumes of real-time streaming data. But which service should you choose for real-time streaming with data persistence? Find the answer in the article!

#messaging #aws

Klarna BNPL usage of Amazon Kinesis 🍼
Klarna uses Amazon Kinesis Data Analytics for Apache Flink for real-time decision-making, providing faster and more reliable shopping experiences. Initially faced with high latency issues using Apache Kafka and AWS Lambda, Klarna's solution now leverages an API with DynamoDB for decision storage and Kinesis Data Analytics for processing. Find out how the fully managed nature of Kinesis Data Analytics has improved Klarna's workflow, allowing for quick onboarding of new cases and the auto-scaling feature facilitating growth.

#aws #streaming

How we learned to improve Kubernetes CronJobs at Scale. Part 1 πŸ‘·β€β™‚οΈ
Lyft migrated nearly 500 cron tasks to a Kubernetes infrastructure, aiming for efficiency and containerization. However, the transition presented challenges. Kubernetes CronJobs experienced significant startup delays and complex failure handling. Additionally, the repeated execution of CronJobs was sometimes interrupted by these delays, causing missed runs. Lyft plans to share how these issues were addressed in a future article to improve the reliability and usability of CronJobs. To my personal taste the scale of payload is pretty low, but still a valuable article.

#kubernetes #k8s

Is it possible to run a huge number of Android UI tests on each PR? πŸ‘·β€β™‚οΈ
Running Android UI tests is a no-joke: you need to design the tests properly, prepare the infrastructure, make the reruns for failed tests to account for flakiness, etc. My friend Evgeny Matsuk wrote a series containing 5 blog posts explaining all those mechanics and giving the solution to the problem of running huge amount of tests in the limited time. Please read it carefully!

#qa #automation #mobile

People and Security Incentives πŸ‘·β€β™‚οΈ
Understanding and managing incentives and biases in people, organizations, and AI is crucial for risk management and cybersecurity. Strategies include utilizing force field analysis to identify forces affecting change and managing hidden incentives and conflicting risks. The practice of 'Escalation as a Service' - highlighting risks to leadership for resolution - is key. Adopting the qualities of High Reliability Organizations (HROs), such as proactivity, critical thinking, flexibility, open communication, and valuing expertise, can enhance security. Aligning the incentive structure with security goals requires an understanding of these incentives and the ability to manipulate them.

#security
πŸ‘3❀2πŸ”₯1🀩1
"Frugal Software Architecture Part 3. Cost Optimization" is live for patrons and boosty subscribers.
Architecture Weekly #74 - Highlights

Frugal Software Architecture. Part 3. Cost Optimization 🍼
Next chapter of Frugal Software Architecture inbound. This time we consider Cost Optimization and figure out cost drivers in software architecture, explore strategies for cost optimization, discuss the balance between cost optimization and performance, provide practical steps to implement cost optimization and illustrate these concepts with real-world case studies.

#frugality

MongoDB vs PostgreSQL vs ScyllaDB πŸ‘·β€β™‚οΈ
This brief yet insightful piece uncovers three key lessons from our journey comparing and migrating NoSQL databases. Tractian engineers found that each database is tailored for specific use cases, and thorough testing reveals the best fit. Adaptability and an openness to change are vital in this process, often leading to major, but necessary, engineering shifts. The ultimate goal is to select a system that supports your product's bright future.

#casestudy #db #postgre

Hard Stuff Nobody Talks about when Building Products πŸ‘·β€β™‚οΈ
LLM-based startups are blooming like cryptocurrency ones several years ago. This time the technology is really useful but does not mean it's coming without problems. In the blog of Honeycomb.io they tell the story of using LLM to provide natural language for their observability platform and how hard it is from the performance, correctness and relevancy perspective. Exciting read!

#db #vectordb #database
❀1
Architecture Weekly #74 - Follow Up

What is a Vector Database 🀟
In the context of AI and machine learning, embeddings generated by models like Large Language Models pose significant management challenges due to their numerous attributes. Specialized vector databases, such as Pinecone, cater to these needs with optimized storage and querying capabilities that traditional scalar-based databases and standalone vector indexes lack. Their specific design for handling complex, large-scale data allows for superior performance, scalability, and flexibility, enabling insights extraction and real-time analysis. A vector database enhances AI capabilities with advanced features, such as semantic information retrieval and long-term memory, underlining its critical role in data-intensive applications.

#ai #database

EventStorming Tips 🍼
EventStorming is a powerful technique to identify the domains boundaries and design the solution based on the business domain. However, it can bring less value if done incorrectly. Find an article with the tips on how to make the most from your EventStorming Sessions.

#microservices #eventstorming #event

Detecting AI-Generated Profile Photos πŸ‘·β€β™‚οΈ
It became increasingly hard for humans to distinguish the AI-generated profile picture from the real one. This problem is especially annoying for social networks like LinkedIn. They decided to conduct a research and discovered a way to see whether the image is generated or more likely a real one. Follow the research below.

#ai

Modular Architecture for Development Teams 🍼
Find another story of an overgrown monolith which slows down the feature development. Once you added significantly high amount of features to your application, adding more is getting more time. Not coming as a surprise, but applying the platform approach and Domain-Driven Design can really help. Follow the story in Martin Fowler's blog.

#casestudy #ddd

Migrating Netflix to GraphQL Sarfely πŸ‘·β€β™‚οΈ
Netflix migrated to GraphQL with their whole mobile fleet last year. Now they are telling the story of the successful migration: introducing the facade first, fighting the problems with security and performance, and then do underlying changes. Checkout the post in their technical blog!

#graphql #migration

Building distributed RocksDB with OmniPaxos in 8 minutes 🀟
The post will demonstrate how to use OmniPaxos to build a simple distributed database. It will bring the system from a single-server database service to a distributed setup with multiple servers replicating the database. OmniPaxos will act as the replicated changelog for the distributed database that provides a single execution order for all replicas so they remain consistent.

#db #distributedsystems
πŸ”₯5πŸ‘1
Architecture Weekly #75 Highlights

Static stability with Availability Zones πŸ‘·β€β™‚οΈ
Dynamic stability means that when something bad happens with a resource your service relies on, then another instance of this resource will be spawned as mitigation. Static stability is being ready and degrading while new resource is catching up. Many services in AWS are built with Static stability. Find out how it works with the example of EC2!

#aws #cloud #resilience

Ensuring the Successful Launch of Ads on Netflix πŸ‘·β€β™‚οΈ
Netflix introduced a new tier, "Basic with ads", in November 2022. To ensure a smooth launch, they simulated user traffic patterns to uncover potential issues and validate ad algorithms. The team started with a small traffic percentage, eventually ramping up to 100%. This strategy also tested the system's resilience to sudden traffic spikes. The successful simulation method is being integrated into their CHAP experimentation platform for wider use.

#casestudy

Zero-day attack prevention via enhanced mobile app security 🍼
New post in my own blog! Zero-day vulnerabilities are very hard to deal with for mobile developers. Still, there are a handful of strategies to employ to minimize the damage. In this article, we’ll explore some enhanced mobile app security strategies that can aid in zero-day attack prevention. Armed with the insights below, you can protect your app and safeguard your end users' valuable data and privacy.

#security #mobile #zeroday
❀2πŸ”₯1
Architecture Weekly #75 Follow-Up

What's wrong with OpenAPI? 🍼
OpenAPI is a way of generating the code based on the formal description or making the documentation based on the code. It is also handy to use OpenAPI for API First approach. However, OpenAPI is very verbose and not easily human-readable. In this note another way of describing HTTP API is suggested, check it out!

#api

From Technical Debt to Technical Health with HealthCheck 🍼
Technical debt is clear to understand by engineers, but it is an obscure obstacle to managers of all kinds. However, managers understand well the monetary aspects. Mikael Vesavuori provides an approach to bring the visibility of direct financial impact of tech debt by a HealthCheck approach. Find it here.

#quality

Building a Startup from Scratch: My Mistakes as CTO πŸ‘·β€β™‚οΈ
I would like you to read not because there is a good system design here, but more of the opposite. The guy decided to go with the Microservices style backed up by Kubernetes, and guess what? Several months later the startup failed. I bet that's because they overcomplicated the design so much that they didn't have any resources to pivot. Don't make the same mistake: start small and frugal.

#casestudy

Security Certification Roadmap 🀟
Certifications do not guarantee any success in any of the IT fields, security included, but they help to grasp the foundational moments and build a solid picture. Here I am sharing the roadmap of certifications in 8 different areas of security.

#security

8 Steps in the Event Storming Process πŸ‘·β€β™‚οΈ
Event Storming is a workshop format used to quickly find and understand the domain level events that drive a business process. It's a lightweight process modeling technique involving sticky notes and collaboration. It is highly flexible and can be adapted to various contexts, including setting a long-term vision. Find the 8 steps that would help the process smooth and efficient.

#ddd #eventstorming

The Three Types of Enterprise Architecture Framework πŸ‘·β€β™‚οΈ
Enterpirse Architecture - in short - is how to simplify the development of the enterprise from the perspective of technology, business and organization. As there are similarities in business, so there are in the architecture frameworks. Find an article describing 3 types of Enterprise Architecture Frameworks.

#architecture #ea
Architecture Weekly #76 - Highlights

How to prevent digital wallet fraud πŸ‘·β€β™‚οΈ
Mobile banks, payments apps and crypto-wallets are very popular, hitting 120 million users in 2021. Where the money are, there the malicious users are headed. Phishing, social engineering and pure software bugs can lead to money loss. Find out the case studies and recommendations on how to battle those problems in an awesome material by Cossack Labs.

#security

Gossip Protocol 🀟
The Gossip Protocol is valuable in distributed systems because of its efficiency, scalability, and robustness. It effectively disseminates information across a large number of nodes with minimal network traffic, making it highly scalable even for substantial networks. Moreover, due to its randomized communication pattern, it is highly robust, enabling the system to tolerate failures and still maintain the speed and accuracy of information propagation. Find the description and explanation of its workings below!

#gossip #distributedsystems

Cracking the mobile system design interview
πŸ‘·β€β™‚οΈ
System design interview is typically a tricky step, and if there are tons of materials on the backend types, there are very few for mobile system design interview. We are fixing this now! Find a good guide on the preparation to a mobile interview. If you don't find that sufficient, you can always request a consultation or a mock interview here or contacting me via telegram.

#systemdesign #mobile #interview
πŸ‘1
Architecture Weekly #76 Follow-Up

How LinkedIn Serves Over 4.8 million member profiles per Second
πŸ‘·β€β™‚οΈ
Read on LinkedIn profiles doubling YoY made the engineering team reconsider how they serve those. So they decided to introduce a Couchebase-enabled caching layer. Find more details inside!

#casestudy #performance

Introduction to eTOM 🍼
I used to include blue-prints and reference designs for particular business domains, it's time to resume it. Find the eTOM - a framework for telecommunications industry's business processes. The material includes the explanation of various parts of the standard alongside with the evolution of the standard.

#standard

How Gradle cut AWS storage costs by 75% using S3 🍼
Build Scan is an X-Ray tool to understand your Gradle or Maven build. Gradle stores the data about the build steps, timings, used dependencies in the Amazon RDS. Eventually it bit them with the price, so they reconsidered and moved to S3, cutting costs by 75%. More details below.

#casestudy

Load Balancer explained from a Startups perspective 🍼
How a technology should handle the ever-growing load of their systems? Follow this post to get a grasp on the different types of load balancing solutions and different balancing algorythms.

#loadbalancing #performance

Improving Performance with HTTP Streaming πŸ‘·β€β™‚οΈ

Serving HTML content as soon as possible is essential to the modern web. AirBnb tried to reduce the time to First Contentful Paint to less than 100ms. And they managed to achieve it by streaming chunks of the HTML page. Find the technical details in AirBnb blog.

#performance #web

Lessons learned from Running Web Experiments at Square
πŸ‘·β€β™‚οΈ
The article provides detailed insights into best practices for conducting successful A/B testing, particularly from the perspective of Square's Ecosystem Discovery team. Key points include Establish a metric hierarchy, proper bucketing of test subjects, gradually ramping up A/B test traffic in phases and several others.

#casestudy #abtesting
πŸ‘3