For a while, my understanding of what WebAssembly, wasm, is good for fit roughly the following pattern:
• There is an isolated, mostly algorithmic, task.
• This task is best to be solved on the client side, in the browser.
• One could think of writing an
• But an implementation in another language already exists, on the backend (BE).
• So just take this implementation, wrap it into wasm, "ship" the very function to the frontend (BE), and you're done.
One big benefit of this approach is that the FE itself becomes fully decoupled from the implementation of this magic function. There is no dependency on the FE developer, on the build process of the FE repo, or the like. Local iterations are fast and local, just as I like them.
Plus, the tests for the existing code effectively test the wasm-transpiled code as well. In my case, when the code is C++, tested with googletest, and the transpiler is Emscripten (emsdk), everything just works out of the box. I only needed to add a build target so that, among other things, my freshly-built BE also exposes the very transpiled wasm function.
On the team dynamics level, it is awesome that the FE people can focus on making the product look & feel great to the user, while the BE people can do the data heavylifting part. And this way to use wasm does the job indeed.
But the above is Level One of wasm. I'm enlightened now, and can fast track you through the full journey.
Level Two: Ship the "demo" of your SPA with 100% of its BE code served by browser-run wasm.
Chances are, your BE is already implemented in a language that can be transpiled into wasm (node.js, Java, C++, Go, all fit). So, it's a one-weekend hackathon for your BE team to make sure 100% of the necessary code is shipped as wasm, and then another one-weekend hackathon for the FE team to integrate it.
The win of this approach might look minor, but, trust me, it is oh so important for a great demo. It is that your web app becomes extremely, insanely responsive.
Chances are, especially if you are a startup, that you have taken quite a few shortcuts here and there. Extra network round-trips are being made, and/or some operations that can be cleverly combined into synchronous updates (ref. BFF) are now asynchronous. Well, with wasm — assuming the amount of data you are crunching is small, of course — consider these problems non-existent. If all you need is to run to serve the response from a BE is a few quick for-loops, trust me, your wasm engine would likely be able to do it in single-digit milliseconds; indistinguishable from instant by most Web App users these days.
Level Three: Automagically add "offline mode" to any SPA ("single-page [web] app"), with no code change. It is surprisingly easy once this very SPA already contains an initializer for the wasm wrapper. Once it is there, literally all one needs to make an SPA offline-friendly is to:
• Build the whole BE, including the data it needs for the demo, into a single wasm bundle,
• Add the logic to intercept
• Whenever the FE code is calling the API, don't make an outgoing HTTP call, but call the now-local wasm function instead.
I have not done this myself; and I would probably not go this far with automating the process. But the idea is just too brilliant for me to not share it.
Who knows, maybe soon we would learn to launch full Docker containers via wasm. There is a fully functioning wasm Linux distro, after all, so the path is paved. And then someone could ship an
This someone might be me, but I've got more things to build now. So, the idea is up for grabs — feel free to!
• There is an isolated, mostly algorithmic, task.
• This task is best to be solved on the client side, in the browser.
• One could think of writing an
npm
module that solves this task.• But an implementation in another language already exists, on the backend (BE).
• So just take this implementation, wrap it into wasm, "ship" the very function to the frontend (BE), and you're done.
One big benefit of this approach is that the FE itself becomes fully decoupled from the implementation of this magic function. There is no dependency on the FE developer, on the build process of the FE repo, or the like. Local iterations are fast and local, just as I like them.
Plus, the tests for the existing code effectively test the wasm-transpiled code as well. In my case, when the code is C++, tested with googletest, and the transpiler is Emscripten (emsdk), everything just works out of the box. I only needed to add a build target so that, among other things, my freshly-built BE also exposes the very transpiled wasm function.
On the team dynamics level, it is awesome that the FE people can focus on making the product look & feel great to the user, while the BE people can do the data heavylifting part. And this way to use wasm does the job indeed.
But the above is Level One of wasm. I'm enlightened now, and can fast track you through the full journey.
Level Two: Ship the "demo" of your SPA with 100% of its BE code served by browser-run wasm.
Chances are, your BE is already implemented in a language that can be transpiled into wasm (node.js, Java, C++, Go, all fit). So, it's a one-weekend hackathon for your BE team to make sure 100% of the necessary code is shipped as wasm, and then another one-weekend hackathon for the FE team to integrate it.
The win of this approach might look minor, but, trust me, it is oh so important for a great demo. It is that your web app becomes extremely, insanely responsive.
Chances are, especially if you are a startup, that you have taken quite a few shortcuts here and there. Extra network round-trips are being made, and/or some operations that can be cleverly combined into synchronous updates (ref. BFF) are now asynchronous. Well, with wasm — assuming the amount of data you are crunching is small, of course — consider these problems non-existent. If all you need is to run to serve the response from a BE is a few quick for-loops, trust me, your wasm engine would likely be able to do it in single-digit milliseconds; indistinguishable from instant by most Web App users these days.
Level Three: Automagically add "offline mode" to any SPA ("single-page [web] app"), with no code change. It is surprisingly easy once this very SPA already contains an initializer for the wasm wrapper. Once it is there, literally all one needs to make an SPA offline-friendly is to:
• Build the whole BE, including the data it needs for the demo, into a single wasm bundle,
• Add the logic to intercept
XMLHttpRequest
calls during the very SPA initialization of the wasm part, and then• Whenever the FE code is calling the API, don't make an outgoing HTTP call, but call the now-local wasm function instead.
I have not done this myself; and I would probably not go this far with automating the process. But the idea is just too brilliant for me to not share it.
Who knows, maybe soon we would learn to launch full Docker containers via wasm. There is a fully functioning wasm Linux distro, after all, so the path is paved. And then someone could ship an
npm
module that would just take a Docker container, wrap it into wasm, and then surgically intercept the calls to the BE to now be served within the browser, probably running this Docker container in a separate Web Worker.This someone might be me, but I've got more things to build now. So, the idea is up for grabs — feel free to!
The whole story is some 5x longer, but today I had the most fascinating conversation with a road policeman in my life. After about fifteen minutes of communication via Google Translate, it ended the following way:
— You made a turn left where the left turn is prohibited.
— Sir, there were no prohibiting signs along the route I took.
— There was a sign. You made an illegal turn.
— There's no sign there! I'm always careful, Swiss driving school. Why would I violate the rules if there was a sign?
— There was a sign.
— Nah, there was no sign at all. Wanna go together and check?
— Hmm ... how about this: we go there, and IF THERE IS A SIGN YOU PAY DOUBLE THE AMOUNT?
— LET'S GO!
There was no sign. Also, I didn't like how this guy handled his machine gun; I had to keep it next to his body with my own knee while sitting behind on his ATV for it to not point at my calf.
— You made a turn left where the left turn is prohibited.
— Sir, there were no prohibiting signs along the route I took.
— There was a sign. You made an illegal turn.
— There's no sign there! I'm always careful, Swiss driving school. Why would I violate the rules if there was a sign?
— There was a sign.
— Nah, there was no sign at all. Wanna go together and check?
— Hmm ... how about this: we go there, and IF THERE IS A SIGN YOU PAY DOUBLE THE AMOUNT?
— LET'S GO!
There was no sign. Also, I didn't like how this guy handled his machine gun; I had to keep it next to his body with my own knee while sitting behind on his ATV for it to not point at my calf.
My Mobile Podcast Studio
After a year of trial and error, here's the setup that works for me.
https://dimakorolev.substack.com/p/my-mobile-podcast-studio
After a year of trial and error, here's the setup that works for me.
https://dimakorolev.substack.com/p/my-mobile-podcast-studio
Dima Korolev
My Mobile Podcast Studio
After a year of trial and error, here's the setup that works for me.
I'm looking into setting up open source reproducible blueprints / performance tests. So that they can be run on the hardware paid by the company, but the code remains open & reusable. Is this a crazy idea?
Dima Korolev
Reproducible Performance Tests
Kept in a personal, open, repo, run on a company-hosted, cloud, K8S infra.
My post on Interviewing.io on whether various scores provided by the interviewer are created equal.
https://blog.interviewing.io/does-communication-matter-in-technical-interviewing-we-looked-at-100k-interviews-to-find-out/
https://blog.interviewing.io/does-communication-matter-in-technical-interviewing-we-looked-at-100k-interviews-to-find-out/
interviewing.io
Does communication matter in technical interviewing? We looked at 100K interviews to find out.
We looked at the outcomes of over 100k interviews, and it turns out that talk is cheap.
New SysDesignMeetup episode, Leveraging the Cores: from CPU and the OS kernel to coroutines, green threads, and actor models.
YouTube
Coroutines :: SysDesignMeetup :: 2022-May-21
We get into coroutines and green threads from first principles: CPU internals, user/kernel space, schedulers, I/O blockers. Actors for dessert.
#NoWar is obligatory now, as this is recorded and released in May 2022. I am hoping for the madness that is unfolding…
#NoWar is obligatory now, as this is recorded and released in May 2022. I am hoping for the madness that is unfolding…
My old friend Jimmy makes very good points in a short essay When Rest is Stress.
Linkedin
When Rest is Stress: Does Taking Time Off Give You Anxiety or Gloom?
Rest is hard. Vacation is really hard.
I was just interviewing a staff level candidate, and, during the reverse interview, they asked me about what levels are there in the company for the staff level to aspire to.
While clarifying the question, it became clear that among what they are looking to explore is the process of standardizing decisions across the company. About microservices, patterns, languages, storages, etc.
The current job title of this candidate is a senior engineer. I happened to know this before the interview because they looked me up on LinkedIn yesterday.
So I gave my most sincere answer:
• As a senior person, you are encouraged to clean up the mess that is affecting your and your team's performance.
• As a staff person, you are expected to prioritize the mess, and even, at times, embrace the mess.
In a way, I was brutally honest, as I like to be. Because the observation I shared holds true in all the companies I was part of. But it's somehow not what the interviewer is proud of, and hence it's not something interviewers would generally share when asked.
Instead, I turned the question around. I shared the big initiatives I am aware or and/or am part of, such as microservice interfaces standardization. I explained why grass roots bottom up initiatives rarely work when it comes to company-wide standards. And I reassured the candidate that, while we are working on reducing the mess where it affects our day duties negatively, they better be fully prepared to experience a certain degree of messiness should they join.
And that they will not be expected to clean up every single mess they encounter. On the contrary, from a staff level person, the company would expect them to exercise their judgement on which messes are best to be taken care of, and which are not worthy of it strategically.
The candidate was happy. So am I!
While clarifying the question, it became clear that among what they are looking to explore is the process of standardizing decisions across the company. About microservices, patterns, languages, storages, etc.
The current job title of this candidate is a senior engineer. I happened to know this before the interview because they looked me up on LinkedIn yesterday.
So I gave my most sincere answer:
• As a senior person, you are encouraged to clean up the mess that is affecting your and your team's performance.
• As a staff person, you are expected to prioritize the mess, and even, at times, embrace the mess.
In a way, I was brutally honest, as I like to be. Because the observation I shared holds true in all the companies I was part of. But it's somehow not what the interviewer is proud of, and hence it's not something interviewers would generally share when asked.
Instead, I turned the question around. I shared the big initiatives I am aware or and/or am part of, such as microservice interfaces standardization. I explained why grass roots bottom up initiatives rarely work when it comes to company-wide standards. And I reassured the candidate that, while we are working on reducing the mess where it affects our day duties negatively, they better be fully prepared to experience a certain degree of messiness should they join.
And that they will not be expected to clean up every single mess they encounter. On the contrary, from a staff level person, the company would expect them to exercise their judgement on which messes are best to be taken care of, and which are not worthy of it strategically.
The candidate was happy. So am I!
Frontend and full stack folks, how do you deal with those huge diffs in
I'm aware that
Maybe there exists a GitHub setting to exclude certain files (or certain paths) from diffs? And from pull request line counters deltas? Or maybe put two pairs of numbers there,
Maybe I could put some
Maybe there's a browser extension that would emulate this behavior? (Maybe I should build one or invest in a team that is building one?)
Maybe there's something a frontend developer should know that reduces or eliminates the bloat described above?
package-lock.json
and with all the */__snapshots__/*
stuff?I'm aware that
package-lock.json
is recommended to be put under source control. But at least in a separate commit, right?Maybe there exists a GitHub setting to exclude certain files (or certain paths) from diffs? And from pull request line counters deltas? Or maybe put two pairs of numbers there,
+added, -removed, +added_boilerplate, -removed_boilerplate
?Maybe I could put some
.gitignore-boilerplate
file and the problem would be gone?Maybe there's a browser extension that would emulate this behavior? (Maybe I should build one or invest in a team that is building one?)
Maybe there's something a frontend developer should know that reduces or eliminates the bloat described above?
So I was doing a "sale call" interview today.
That's the first conversation the candidate has after speaking with the recruiter. That is, the first technical conversation. I am not even expected to provide feedback, and I share this with the candidate first thing, to keep them relaxed.
The objective of the sale call is to make sure the candidate is sufficiently interested in the role, and to gather some info on what teams and roles could they be a good fit for. It is not completely impossible to "fail" this "interview", but, unless the candidate truly wants to bail out, it's hard to imagine a conversation after which the company would want to pass on them.
After a routine past experience chit-chat, this candidate asks me up front: What do you do? And then: What would I do if I join?
Well, he likes designing systems. And his resume has all the relevant bits and pieces. And he has signed the NDA.
So, you're asking what do I do? "Hold my beer", I say figuratively, while closing all confidential tabs. I then zoomed in on that Miro board sufficiently to hide all the people and project names, and screen-shared our design diagram. "Look, here's what we do. That's about what you're looking for, right?"
Guess what happened next?
He criticizes our design!
The candidate, on a sale call that is not even an interview, is telling me that the design we've spent months and months on is suboptimal, and that he knows how to make it better.
Now pause reading for a moment and note your emotion. Those of you who know me well can already predict what would happen next. For the rest of you: enjoy responsibly.
Clearly, I bit the bullet. Agreed with all his comments. Provided a few extra real-life constraints. And one more time, and then one more time. Until he agreed it's not that clear, and that he would probably make the same decisions as we did should he have all the same context.
I thanked him sincerely for the conversation, and assured him I've gathered enough signal about what the best role for him in our company looks like. I've told him I'm putting my feedback straight in, and that the recruiter would be back in touch shortly.
Then I took a deep breath and wrote my feedback. Plus recruiters notes. Along the lines of:
• This is a great candidate.
• We need more people like him.
• I did not interview him technically, but would bet he's strong.
• And we need to do our best to make sure that, provided he passes the tech screen, he receives a great offer and joins us.
In the industry, we need more people who ignore all the rules & conventions, and just open up with what they believe is the point of maximum impact. I love working with them. I hope to be one of them. And, in my experience, this is paying off greatly, both professionally and personally.
That's the first conversation the candidate has after speaking with the recruiter. That is, the first technical conversation. I am not even expected to provide feedback, and I share this with the candidate first thing, to keep them relaxed.
The objective of the sale call is to make sure the candidate is sufficiently interested in the role, and to gather some info on what teams and roles could they be a good fit for. It is not completely impossible to "fail" this "interview", but, unless the candidate truly wants to bail out, it's hard to imagine a conversation after which the company would want to pass on them.
After a routine past experience chit-chat, this candidate asks me up front: What do you do? And then: What would I do if I join?
Well, he likes designing systems. And his resume has all the relevant bits and pieces. And he has signed the NDA.
So, you're asking what do I do? "Hold my beer", I say figuratively, while closing all confidential tabs. I then zoomed in on that Miro board sufficiently to hide all the people and project names, and screen-shared our design diagram. "Look, here's what we do. That's about what you're looking for, right?"
Guess what happened next?
He criticizes our design!
The candidate, on a sale call that is not even an interview, is telling me that the design we've spent months and months on is suboptimal, and that he knows how to make it better.
Now pause reading for a moment and note your emotion. Those of you who know me well can already predict what would happen next. For the rest of you: enjoy responsibly.
Clearly, I bit the bullet. Agreed with all his comments. Provided a few extra real-life constraints. And one more time, and then one more time. Until he agreed it's not that clear, and that he would probably make the same decisions as we did should he have all the same context.
I thanked him sincerely for the conversation, and assured him I've gathered enough signal about what the best role for him in our company looks like. I've told him I'm putting my feedback straight in, and that the recruiter would be back in touch shortly.
Then I took a deep breath and wrote my feedback. Plus recruiters notes. Along the lines of:
• This is a great candidate.
• We need more people like him.
• I did not interview him technically, but would bet he's strong.
• And we need to do our best to make sure that, provided he passes the tech screen, he receives a great offer and joins us.
In the industry, we need more people who ignore all the rules & conventions, and just open up with what they believe is the point of maximum impact. I love working with them. I hope to be one of them. And, in my experience, this is paying off greatly, both professionally and personally.
Forwarded from SysDesign Meetup
And the Distributed Task Queue episode is now released!
YouTube
Distributed Task Queue :: SysDesignMeetup :: 2022-July-02
Distributed Task Queue. We talk about different ways to postulate and reason about the problem, effectively from how would a junior engineer approach the Task Queue problem to how would a Cloud Architect view it.
Thank you for 1K YouTube subscribers!
#NoWar…
Thank you for 1K YouTube subscribers!
#NoWar…
Our OPA C++ Transpilation blog post, thanks to Max.
Dima Korolev
High-Performance OPA
Transpiling Rego policies into C++.
Incredible that we still live in such a world.
There's data of major importance. It is clearly affecting many people's short-term life choices. This data is about high-level decisions made inside public companies. A large number of people within each respective company know exactly what those decisions are.
Moreover, those decisions can mostly be communicated via a yes/no answer to a trivial question: Is there a hiring freeze?
To add to this, we have an enormous amount of information shared between us all. Take Blind, for example, where everything leaks regardless.
And yet, we have to rely on error-prone and noisy mediums, such as — surprise! — good old polls, to have a glimpse into what's really going on. Seriously, "Don't look up", we have this covered.
To this day, I don't understand how public companies are under no obligation to disclose what roles they are hiring for, and the state of their funnel, at any given day. This reporting would deal no harm to a company if done by everyone. It's like financial reports, even less transparent. (My second question, by the way, is what do recruiters do during hiring freezes.)
Kudos to Aline for being a ray of light in this conversation: https://blog.interviewing.io/whats-actually-going-on-with-google-and-facebook-hiring-freezes-we-surveyed-1000-engineers-to-find-out/
There's data of major importance. It is clearly affecting many people's short-term life choices. This data is about high-level decisions made inside public companies. A large number of people within each respective company know exactly what those decisions are.
Moreover, those decisions can mostly be communicated via a yes/no answer to a trivial question: Is there a hiring freeze?
To add to this, we have an enormous amount of information shared between us all. Take Blind, for example, where everything leaks regardless.
And yet, we have to rely on error-prone and noisy mediums, such as — surprise! — good old polls, to have a glimpse into what's really going on. Seriously, "Don't look up", we have this covered.
To this day, I don't understand how public companies are under no obligation to disclose what roles they are hiring for, and the state of their funnel, at any given day. This reporting would deal no harm to a company if done by everyone. It's like financial reports, even less transparent. (My second question, by the way, is what do recruiters do during hiring freezes.)
Kudos to Aline for being a ray of light in this conversation: https://blog.interviewing.io/whats-actually-going-on-with-google-and-facebook-hiring-freezes-we-surveyed-1000-engineers-to-find-out/
interviewing.io
What’s actually going on with Google and Facebook hiring freezes? We surveyed 1000 engineers to find out.
To make sense of all the contradictory info on Google & Facebook hiring freezes, we surveyed hundreds of engineers who are interviewing there right now.
I'm wholly unqualified to be a game developer. I mean, this many glitches to use and abuse? Even a mere possibility for one of them to emerge would keep be up late at nights trying to find a clean solution that makes the glitch impossible, or, at least, unattainable through regular gameplay mechanics.
(Inspired by jetlag, 5am, and watching the "Brood Lords range 20" video, where Korean pros finally found a way to abuse attacking own broodlings to keep their Brood lords unreachable for Thors. However, a friend points out, this trick would only be handy within a narrow opportunity window while your opponent does not have a chance to transition into air; or, much like in the very replay, if it's already lategame with Zerg's opponent staying mostly on land.)
(Inspired by jetlag, 5am, and watching the "Brood Lords range 20" video, where Korean pros finally found a way to abuse attacking own broodlings to keep their Brood lords unreachable for Thors. However, a friend points out, this trick would only be handy within a narrow opportunity window while your opponent does not have a chance to transition into air; or, much like in the very replay, if it's already lategame with Zerg's opponent staying mostly on land.)
YouTube
How Speedrunners Beat Portal Without the Portal Gun
Yeah, I love " ", definitely one of my favorite games.
►Watch The Zero Portals Run: https://www.youtube.com/watch?v=WsOBG9JICt0
►Support me on Patreon: https://www.patreon.com/Msushi
►Twitch: https://www.twitch.tv/msushi
►Twitter: https://twi…
►Watch The Zero Portals Run: https://www.youtube.com/watch?v=WsOBG9JICt0
►Support me on Patreon: https://www.patreon.com/Msushi
►Twitch: https://www.twitch.tv/msushi
►Twitter: https://twi…
I keep reading more and more about GraphQL [1]. It's useful both for my meetup and for my work, so keeping up is more and more of my professional duties than intellectual curiosity these days.
Originally — say, ~three years ago — I was quite skeptical about the prospects of this "language". Back then, it didn't even have mutations, after all, and "JOIN"-ing data between, say, PostgreSQL and Redis was remarkably inefficient. Besides, the query language is not standard and not mapped 1:1 to JavaScript or anything else popular, and type checking was semi-strict. So it did look like an interesting pet project, but I was not convinced it would have a future.
Today, after hearing more and more about GraphQL's power, and after doing more research, I'm ready to concede I was largely wrong. With
In a way, the role of a "GraphQL DBA" should be emerging as we speak.
And then it clicked. I blogged about such a future back in 2014. My prediction back then [2] was that SQL would:
• Exist in 5 years,
• Exist as legacy in 10 years, and
• Go extinct in 15 years.
Well, I have to be rooting for GraphQL now. Because it may well be the force that proves my vision to be right 😊
[1]: New features in GraphQL: Batch, defer, stream, live, and subscribe. 2021.
[2]: The Future of Data Engineering, What would happen after SQL is no more. 2014.
Originally — say, ~three years ago — I was quite skeptical about the prospects of this "language". Back then, it didn't even have mutations, after all, and "JOIN"-ing data between, say, PostgreSQL and Redis was remarkably inefficient. Besides, the query language is not standard and not mapped 1:1 to JavaScript or anything else popular, and type checking was semi-strict. So it did look like an interesting pet project, but I was not convinced it would have a future.
Today, after hearing more and more about GraphQL's power, and after doing more research, I'm ready to concede I was largely wrong. With
@defer
, @stream
and `@live`GraphQL may well be the future Data API language. Yes, some of these features are still deeply experimental, and I've got quite a few concerns about their theoretical performance.In a way, the role of a "GraphQL DBA" should be emerging as we speak.
And then it clicked. I blogged about such a future back in 2014. My prediction back then [2] was that SQL would:
• Exist in 5 years,
• Exist as legacy in 10 years, and
• Go extinct in 15 years.
Well, I have to be rooting for GraphQL now. Because it may well be the force that proves my vision to be right 😊
[1]: New features in GraphQL: Batch, defer, stream, live, and subscribe. 2021.
[2]: The Future of Data Engineering, What would happen after SQL is no more. 2014.
Data scientists are taking over the world. Read till the end: https://towardsdatascience.com/python-3-14-will-be-faster-than-c-a97edd01d65d
Medium
Python 3.14 will be faster than C++
Benchmarking the new and impressive Python 3.11
Both from first principles and from experience I know that load testing is not trivial.
One of the fundamental lessons is that testing for maximum throughput and testing for minimum latency are two very different things. Simply put, it's almost always possible to extract a few more QPS at massive cost to latency at high percentiles, and it's almost always a bad idea to do so with a production service. Thus, the "maximum QPS" measurements are generally worthless, as they are not far away from "spherical cows in the vacuum".
A solid, holistic, approach is:
1) To agree on the SLA / SLO / SLI of the service. This is a product / user capacity planning exercise. In a way, this part is about postulating the problem.
2) To agree on what we consider the acceptable range of operational parameters for the service and its environment. This is the exercise in software architecture and in site reliability engineering.
We answer the questions about the expected usage of our service in (1). Then we plan how to best build and ship this service in (2).
It is (2) where we answer questions such as "local caching vs. Dynamo", "lambda or EC2", "how to leverage elasticity", or "to service mesh or not to service mesh".
Ideally, the service itself (its individual instances integrated into some environment) would always remain within its operating mode defined by (2). The service accomplishes this goal by simply rejecting the excess requests that would take it out of this mode.
For load testing, it is important to understand that only after (2) is established, and only we have the means to spin up the service up in some test environment, we confirm that it conforms with (1).
An example of (1) might be: hold 1K QPS, with a certain number of nines, median latency under 5ms, p99 latency under 10ms, p99.9 latency under 25ms. Because we believe 1K is our peak traffic during the busiest hours, and we postulate that latency numbers above these figures would deteriorate user experience and result in business lost.
An example of (2) might be: use up to four nodes/servers/pods of certain parameters, max. 90% CPU load, max. 70% RAM utilization on each node, run on Kubernetes, within a certain service mesh.
This is not all it takes to properly load test the system. In order to confirm (1), we need a well-defined and well-specified understanding of the expected user traffic. Such as: we expect a Poisson distribution of requests averaging 1K per second during our peak hour. We assume we can model such a load using N=100 "virtual users", even though a perfect load test would be sending all these requests from different IP addresses.
~ ~ ~
Here's a trick question. How do you communicate the above to the people who think along the lines of:
• "Based on our load testing so far we believe we are network-bound"?
And:
• "This is what the documentation to this load testing tool says, it has tons of flags to play around with, and it can simulate any load?"
And:
• "Dima, what you are saying makes sense, but it's too much and hard to follow. Is there an article I can read and understand all this?"
~ ~ ~
Lazy web, I have two questions from my side:
1) Am I making things too complicated, or does it sound reasonable so far?
2) Any good articles out there which I could use as references?
3) Bonus question: Would the above be worthy of a SysDesign meetup episode?
One of the fundamental lessons is that testing for maximum throughput and testing for minimum latency are two very different things. Simply put, it's almost always possible to extract a few more QPS at massive cost to latency at high percentiles, and it's almost always a bad idea to do so with a production service. Thus, the "maximum QPS" measurements are generally worthless, as they are not far away from "spherical cows in the vacuum".
A solid, holistic, approach is:
1) To agree on the SLA / SLO / SLI of the service. This is a product / user capacity planning exercise. In a way, this part is about postulating the problem.
2) To agree on what we consider the acceptable range of operational parameters for the service and its environment. This is the exercise in software architecture and in site reliability engineering.
We answer the questions about the expected usage of our service in (1). Then we plan how to best build and ship this service in (2).
It is (2) where we answer questions such as "local caching vs. Dynamo", "lambda or EC2", "how to leverage elasticity", or "to service mesh or not to service mesh".
Ideally, the service itself (its individual instances integrated into some environment) would always remain within its operating mode defined by (2). The service accomplishes this goal by simply rejecting the excess requests that would take it out of this mode.
For load testing, it is important to understand that only after (2) is established, and only we have the means to spin up the service up in some test environment, we confirm that it conforms with (1).
An example of (1) might be: hold 1K QPS, with a certain number of nines, median latency under 5ms, p99 latency under 10ms, p99.9 latency under 25ms. Because we believe 1K is our peak traffic during the busiest hours, and we postulate that latency numbers above these figures would deteriorate user experience and result in business lost.
An example of (2) might be: use up to four nodes/servers/pods of certain parameters, max. 90% CPU load, max. 70% RAM utilization on each node, run on Kubernetes, within a certain service mesh.
This is not all it takes to properly load test the system. In order to confirm (1), we need a well-defined and well-specified understanding of the expected user traffic. Such as: we expect a Poisson distribution of requests averaging 1K per second during our peak hour. We assume we can model such a load using N=100 "virtual users", even though a perfect load test would be sending all these requests from different IP addresses.
~ ~ ~
Here's a trick question. How do you communicate the above to the people who think along the lines of:
• "Based on our load testing so far we believe we are network-bound"?
And:
• "This is what the documentation to this load testing tool says, it has tons of flags to play around with, and it can simulate any load?"
And:
• "Dima, what you are saying makes sense, but it's too much and hard to follow. Is there an article I can read and understand all this?"
~ ~ ~
Lazy web, I have two questions from my side:
1) Am I making things too complicated, or does it sound reasonable so far?
2) Any good articles out there which I could use as references?
3) Bonus question: Would the above be worthy of a SysDesign meetup episode?
Very proud of this episode we’ve recorded with Interviewing.io.
May well be the best SysDesign interview prep video out there today.
Enjoy responsibly!
https://www.youtube.com/watch?v=IJSVmyhq2i0
May well be the best SysDesign interview prep video out there today.
Enjoy responsibly!
https://www.youtube.com/watch?v=IJSVmyhq2i0
YouTube
System Architect Breaks Down System Design Interview [ex Google/Microsoft]
In this video, Dima Korolev, a system architect at Miro and senior engineer (previously at Google and Microsoft), uses his extensive experience to break down a mock Systems Design interview.
This is the second installment of Inspect Element, an interviewing.io…
This is the second installment of Inspect Element, an interviewing.io…
Discovery of the day:
TL;DR: better upgrade
(Edit: Or so is my conclusion so far. Do feel free to correct me if it's the wrong one.)
One major difference I noted is that when health checks are used, containers take a while to start, and should be started in the right order,
Not that it's life-changing,
But, intuitively, I would much rather be reading the logs of a container that is somehow taking a while to become healthy. Or is there some alternate logic that I'm missing here?
PS: Here's how a productive Sunday looks like to me.
docker-compose ...
and docker compose ...
, while seemingly identical, are two entirely different things under the hood!TL;DR: better upgrade
docker
and use docker compose ...
.(Edit: Or so is my conclusion so far. Do feel free to correct me if it's the wrong one.)
One major difference I noted is that when health checks are used, containers take a while to start, and should be started in the right order,
docker compose
(with a space) would nicely print the output of a container while it is getting healthy, while docker-compose
would stash the output until the container is healthy, and then dump it out in one piece.Not that it's life-changing,
docker logs -f ...
does the job.But, intuitively, I would much rather be reading the logs of a container that is somehow taking a while to become healthy. Or is there some alternate logic that I'm missing here?
PS: Here's how a productive Sunday looks like to me.
GitHub
GitHub - dkorolev/learning-docker-compose
Contribute to dkorolev/learning-docker-compose development by creating an account on GitHub.
С++ folks, did you know
This was quite a surprise for me today!
TL;DR: If you
Then calling
(Yes, order matters, etc., etc. The whole bouquet of issues you could possibly imagine, straight into your face. What a lovely footgun indeed!)
So much for yours truly sincerely believing that "inline is just a cleaner C++ way to #define symbols, with no symbols leaking wherever".
#LiveAndLearn
Ref: https://github.com/C5T/Current/pull/924
inline
constants exported into an .so
are actually not inline
at all?This was quite a surprise for me today!
TL;DR: If you
::dlopen()
two .so
-s, like this:// lib1.so
namespace nuance {
inline int N = 42;
}
extern "C" int Get42() {
return nuance::N;
}
// lib2.so
namespace nuance {
inline int N = 101;
}
extern "C" int Get101WithANuance() {
return nuance::N;
}
Then calling
Get101WithANuance()
will return ... drum rolls ... 42!(Yes, order matters, etc., etc. The whole bouquet of issues you could possibly imagine, straight into your face. What a lovely footgun indeed!)
So much for yours truly sincerely believing that "inline is just a cleaner C++ way to #define symbols, with no symbols leaking wherever".
#LiveAndLearn
Ref: https://github.com/C5T/Current/pull/924
GitHub
The `::dlopen()` nuanced magic isolated and tested! by dkorolev · Pull Request #924 · C5T/Current
@mzhurovich pls test on a Mac.