Not boring, and a bit of a condescending prick
166 subscribers
11 photos
79 links
Semi-digested observations about our world right after they are phrased well enough in my head to be shared broader.
Download Telegram
I was just interviewing a staff level candidate, and, during the reverse interview, they asked me about what levels are there in the company for the staff level to aspire to.

While clarifying the question, it became clear that among what they are looking to explore is the process of standardizing decisions across the company. About microservices, patterns, languages, storages, etc.

The current job title of this candidate is a senior engineer. I happened to know this before the interview because they looked me up on LinkedIn yesterday.

So I gave my most sincere answer:

• As a senior person, you are encouraged to clean up the mess that is affecting your and your team's performance.

• As a staff person, you are expected to prioritize the mess, and even, at times, embrace the mess.

In a way, I was brutally honest, as I like to be. Because the observation I shared holds true in all the companies I was part of. But it's somehow not what the interviewer is proud of, and hence it's not something interviewers would generally share when asked.

Instead, I turned the question around. I shared the big initiatives I am aware or and/or am part of, such as microservice interfaces standardization. I explained why grass roots bottom up initiatives rarely work when it comes to company-wide standards. And I reassured the candidate that, while we are working on reducing the mess where it affects our day duties negatively, they better be fully prepared to experience a certain degree of messiness should they join.

And that they will not be expected to clean up every single mess they encounter. On the contrary, from a staff level person, the company would expect them to exercise their judgement on which messes are best to be taken care of, and which are not worthy of it strategically.

The candidate was happy. So am I!
Frontend and full stack folks, how do you deal with those huge diffs in package-lock.json and with all the */__snapshots__/* stuff?

I'm aware that package-lock.json is recommended to be put under source control. But at least in a separate commit, right?

Maybe there exists a GitHub setting to exclude certain files (or certain paths) from diffs? And from pull request line counters deltas? Or maybe put two pairs of numbers there, +added, -removed, +added_boilerplate, -removed_boilerplate?

Maybe I could put some .gitignore-boilerplate file and the problem would be gone?

Maybe there's a browser extension that would emulate this behavior? (Maybe I should build one or invest in a team that is building one?)

Maybe there's something a frontend developer should know that reduces or eliminates the bloat described above?
So I was doing a "sale call" interview today.

That's the first conversation the candidate has after speaking with the recruiter. That is, the first technical conversation. I am not even expected to provide feedback, and I share this with the candidate first thing, to keep them relaxed.

The objective of the sale call is to make sure the candidate is sufficiently interested in the role, and to gather some info on what teams and roles could they be a good fit for. It is not completely impossible to "fail" this "interview", but, unless the candidate truly wants to bail out, it's hard to imagine a conversation after which the company would want to pass on them.

After a routine past experience chit-chat, this candidate asks me up front: What do you do? And then: What would I do if I join?

Well, he likes designing systems. And his resume has all the relevant bits and pieces. And he has signed the NDA.

So, you're asking what do I do? "Hold my beer", I say figuratively, while closing all confidential tabs. I then zoomed in on that Miro board sufficiently to hide all the people and project names, and screen-shared our design diagram. "Look, here's what we do. That's about what you're looking for, right?"

Guess what happened next?

He criticizes our design!

The candidate, on a sale call that is not even an interview, is telling me that the design we've spent months and months on is suboptimal, and that he knows how to make it better.

Now pause reading for a moment and note your emotion. Those of you who know me well can already predict what would happen next. For the rest of you: enjoy responsibly.

Clearly, I bit the bullet. Agreed with all his comments. Provided a few extra real-life constraints. And one more time, and then one more time. Until he agreed it's not that clear, and that he would probably make the same decisions as we did should he have all the same context.

I thanked him sincerely for the conversation, and assured him I've gathered enough signal about what the best role for him in our company looks like. I've told him I'm putting my feedback straight in, and that the recruiter would be back in touch shortly.

Then I took a deep breath and wrote my feedback. Plus recruiters notes. Along the lines of:

• This is a great candidate.
• We need more people like him.
• I did not interview him technically, but would bet he's strong.
• And we need to do our best to make sure that, provided he passes the tech screen, he receives a great offer and joins us.

In the industry, we need more people who ignore all the rules & conventions, and just open up with what they believe is the point of maximum impact. I love working with them. I hope to be one of them. And, in my experience, this is paying off greatly, both professionally and personally.
Incredible that we still live in such a world.

There's data of major importance. It is clearly affecting many people's short-term life choices. This data is about high-level decisions made inside public companies. A large number of people within each respective company know exactly what those decisions are.

Moreover, those decisions can mostly be communicated via a yes/no answer to a trivial question: Is there a hiring freeze?

To add to this, we have an enormous amount of information shared between us all. Take Blind, for example, where everything leaks regardless.

And yet, we have to rely on error-prone and noisy mediums, such as — surprise! — good old polls, to have a glimpse into what's really going on. Seriously, "Don't look up", we have this covered.

To this day, I don't understand how public companies are under no obligation to disclose what roles they are hiring for, and the state of their funnel, at any given day. This reporting would deal no harm to a company if done by everyone. It's like financial reports, even less transparent. (My second question, by the way, is what do recruiters do during hiring freezes.)

Kudos to Aline for being a ray of light in this conversation:
I'm wholly unqualified to be a game developer. I mean, this many glitches to use and abuse? Even a mere possibility for one of them to emerge would keep be up late at nights trying to find a clean solution that makes the glitch impossible, or, at least, unattainable through regular gameplay mechanics.

(Inspired by jetlag, 5am, and watching the "Brood Lords range 20" video, where Korean pros finally found a way to abuse attacking own broodlings to keep their Brood lords unreachable for Thors. However, a friend points out, this trick would only be handy within a narrow opportunity window while your opponent does not have a chance to transition into air; or, much like in the very replay, if it's already lategame with Zerg's opponent staying mostly on land.)
I keep reading more and more about GraphQL [1]. It's useful both for my meetup and for my work, so keeping up is more and more of my professional duties than intellectual curiosity these days.

Originally — say, ~three years ago — I was quite skeptical about the prospects of this "language". Back then, it didn't even have mutations, after all, and "JOIN"-ing data between, say, PostgreSQL and Redis was remarkably inefficient. Besides, the query language is not standard and not mapped 1:1 to JavaScript or anything else popular, and type checking was semi-strict. So it did look like an interesting pet project, but I was not convinced it would have a future.

Today, after hearing more and more about GraphQL's power, and after doing more research, I'm ready to concede I was largely wrong. With @defer, @stream and `@live`GraphQL may well be the future Data API language. Yes, some of these features are still deeply experimental, and I've got quite a few concerns about their theoretical performance.

In a way, the role of a "GraphQL DBA" should be emerging as we speak.

And then it clicked. I blogged about such a future back in 2014. My prediction back then [2] was that SQL would:

• Exist in 5 years,
• Exist as legacy in 10 years, and
• Go extinct in 15 years.

Well, I have to be rooting for GraphQL now. Because it may well be the force that proves my vision to be right 😊

[1]: New features in GraphQL: Batch, defer, stream, live, and subscribe. 2021.
[2]: The Future of Data Engineering, What would happen after SQL is no more. 2014.
Both from first principles and from experience I know that load testing is not trivial.

One of the fundamental lessons is that testing for maximum throughput and testing for minimum latency are two very different things. Simply put, it's almost always possible to extract a few more QPS at massive cost to latency at high percentiles, and it's almost always a bad idea to do so with a production service. Thus, the "maximum QPS" measurements are generally worthless, as they are not far away from "spherical cows in the vacuum".

A solid, holistic, approach is:

1) To agree on the SLA / SLO / SLI of the service. This is a product / user capacity planning exercise. In a way, this part is about postulating the problem.

2) To agree on what we consider the acceptable range of operational parameters for the service and its environment. This is the exercise in software architecture and in site reliability engineering.

We answer the questions about the expected usage of our service in (1). Then we plan how to best build and ship this service in (2).

It is (2) where we answer questions such as "local caching vs. Dynamo", "lambda or EC2", "how to leverage elasticity", or "to service mesh or not to service mesh".

Ideally, the service itself (its individual instances integrated into some environment) would always remain within its operating mode defined by (2). The service accomplishes this goal by simply rejecting the excess requests that would take it out of this mode.

For load testing, it is important to understand that only after (2) is established, and only we have the means to spin up the service up in some test environment, we confirm that it conforms with (1).

An example of (1) might be: hold 1K QPS, with a certain number of nines, median latency under 5ms, p99 latency under 10ms, p99.9 latency under 25ms. Because we believe 1K is our peak traffic during the busiest hours, and we postulate that latency numbers above these figures would deteriorate user experience and result in business lost.

An example of (2) might be: use up to four nodes/servers/pods of certain parameters, max. 90% CPU load, max. 70% RAM utilization on each node, run on Kubernetes, within a certain service mesh.

This is not all it takes to properly load test the system. In order to confirm (1), we need a well-defined and well-specified understanding of the expected user traffic. Such as: we expect a Poisson distribution of requests averaging 1K per second during our peak hour. We assume we can model such a load using N=100 "virtual users", even though a perfect load test would be sending all these requests from different IP addresses.

~ ~ ~

Here's a trick question. How do you communicate the above to the people who think along the lines of:

• "Based on our load testing so far we believe we are network-bound"?


• "This is what the documentation to this load testing tool says, it has tons of flags to play around with, and it can simulate any load?"


• "Dima, what you are saying makes sense, but it's too much and hard to follow. Is there an article I can read and understand all this?"

~ ~ ~

Lazy web, I have two questions from my side:

1) Am I making things too complicated, or does it sound reasonable so far?

2) Any good articles out there which I could use as references?

3) Bonus question: Would the above be worthy of a SysDesign meetup episode?
Discovery of the day: docker-compose ... and docker compose ... , while seemingly identical, are two entirely different things under the hood!

TL;DR: better upgrade docker and use docker compose ....

(Edit: Or so is my conclusion so far. Do feel free to correct me if it's the wrong one.)

One major difference I noted is that when health checks are used, containers take a while to start, and should be started in the right order, docker compose (with a space) would nicely print the output of a container while it is getting healthy, while docker-compose would stash the output until the container is healthy, and then dump it out in one piece.

Not that it's life-changing, docker logs -f ... does the job.

But, intuitively, I would much rather be reading the logs of a container that is somehow taking a while to become healthy. Or is there some alternate logic that I'm missing here?

PS: Here's how a productive Sunday looks like to me.
С++ folks, did you know inline constants exported into an .so are actually not inline at all?

This was quite a surprise for me today!

TL;DR: If you ::dlopen() two .so-s, like this:

namespace nuance {
inline int N = 42;
extern "C" int Get42() {
return nuance::N;

namespace nuance {
inline int N = 101;
extern "C" int Get101WithANuance() {
return nuance::N;

Then calling Get101WithANuance() will return ... drum rolls ... 42!

(Yes, order matters, etc., etc. The whole bouquet of issues you could possibly imagine, straight into your face. What a lovely footgun indeed!)

So much for yours truly sincerely believing that "inline is just a cleaner C++ way to #define symbols, with no symbols leaking wherever".


Seems like the Twitter debate is the new black. The topic appears to be polarizing like hell, and I found that only a few people can reason about the events calmly.

There's "it's a death sentence for a civic conversation" on the one end, and "finally, we'll be able to see what people truly care about" on the other. With not much in between.

Both sides, of course, are right. Free speech absolutism invites all sorts of extremes, which are generally not desirable for most users. The other extreme, which, IMHO, Twitter was quite leaning towards in the past few years, is to be excessively censoring whatever does not fit "the media narrative", which inevitably results in various viewpoints being silenced, despite being proven true soon thereafter.

My view here remains the same: opening up the algorithm is generally a great thing. I wrote about it at length before.

The TL;DR is: if you want to use The Algorithm to help solve some Massive Social Problem, but then you tweak the results of this algorithm because it does not fit your definition of right, it's you, not the algorithm, who is to blame.

Sure, an algorithm would miss on important real-life implications. With crime prevention, for example, The Algorithm gets into Orwellian, Minority Report-style biases, that should absolutely be corrected for.

But my strong belief is that it is important to correct for these biases BEFORE, not AFTER The Algorithm. In other words, these corrections should be the inputs to the algorithm, not post-overrides.

Thus, if, IF, Elon is going to do with he claims to want to do -- the Algorithm-first approach to content selection -- then I do believe it will be a step forward.

Whether he will actually work towards making it so, or whether this can be implemented with our current state of technology, or whether the regulators and/or other big players would allow Twitter to be such a platform -- only time will tell, I guess.

I do keep my fingers crossed though. And am hoping for the best.
Folks, what do people mean today when they say "Zero Trust" APIs?

My undestanding -- that might be wrong -- is that each service needs to securely validate the request, as if this request is coming from the outer world. Because security, microservices, etc.

But in this model if a request requires 20+ other requests, the only solution is to make 20+ zero-trust requests which would all be validated as if they are coming from, well, zero trust. Which is both a colossal waste of resources and a huge toll on end-to-end user-perceived latency.

What am I missing?