Not boring, and a bit of a condescending prick
147 subscribers
10 photos
68 links
Semi-digested observations about our world right after they are phrased well enough in my head to be shared broader.
Download Telegram
And this is far before we enter the the territory of thinking about the state of a piece of software evolving over time. Which requires the brain wiring of the level absolutely unreachable for even the top-notch ML systems of today. The example of asking GPT-3 to talk to you in sentences of an odd number of words comes in handy here; and it's still about "imagining" one step ahead, not a potentially infinite & unbounded evolution of the data in some DB and the UI/UX of some application.

And any nontrivial piece of software absolutely requires reasoning on this "infinite evolution of data" level, where the mutations of this data form chains and sequences, and inevitably depend on one another.

Now, maybe you can imagine a machine learning system that can "pretend" to "understand" you (by correctly reading between the words and "completing your sentences"), and thus convert your unfiltered & unstructured thoughts into fixed-form software instructions -- all the power to you! Maybe we'll get there eventually.

But, personally, I just can't imagine us, the humankind, developing such a technology in the next 20+ years.

Because, honestly, building correctly functioning software from incomplete specifications is literally the hardest possible AGI task I can think of.

Before we can get there, show me virtual drivers and virtual teachers or coaches and virtual doctors who consistently outperform humans. The experts in those professions — who most certainly are extremely talented and intelligent and hard-working — still do operate in far more constrained environments compared to software developers. Simply because the spaces of all possible driving conditions, or of all possible ways a student may have troubles grasping some topic, of human body malfunctions — these spaces are still rather bounded in their nature, while the space of all possible algorithms and data structures is literally unbounded, and it is expanding at an increasing pace as we speak.
To my taste, this is an extremely important post: https://blog.roblox.com/2022/01/roblox-return-to-service-10-28-10-31-2021/

<rant>

First, it talks about a real problem in a real company, that affected millions of users for an extended period of time.

Second, at least to my taste, it's quite poorly written. Interestingly, at the same time, I believe the authors of the post do not think so!

It's easy to see through the post that the people who were debugging the issue are not seeing the forest, but just the trees.

The median latency that used to be 500ms and now is 2 seconds? Why is it even important? Should 2 seconds, or even 10 seconds, not be good enough for service discovery? How often do you even have new nodes to discover, to begin with?

Shouldn't the whole fleet work just fine if the service mesh / service discovery subsystem is down and not accepting updates? At least while no machines are failing, the previously "discovered" ones should just talk to each other as if nothing happened, no?

Shouldn't there be a read-only cache present for the cases where the mutation path (the write path) is slow? Shouldn't the Consul folks think about this? Especially when introducing the "optimized" access pattern, being fully aware that, under certain circumstances, that are obviously not well understood by most users, this "optimized" access pattern would inevitably be a disaster in the making?

The folks who wrote this blog post seem to not even consider the above questions important; at least it doesn't show through the text at all.

I get it, it is important to keep it cool, to praise the players who "DNS-hacked" the system to begin playing a bit sooner than others. And it is important to keep a good relationship with the Consul team. And it is important to maintain the image of a transparent, developer-first organization.

Nonetheless, to me, the post does read as an affirmation that software system design at scale is experiencing a major crisis. It's the kind of a crisis that can have a cascading domino effect of failures. Across multiple companies.

<more_rant>

Heck, in my professional life I do system design interviews and coaching. And a lot of people sincerely believe Mongo and/or Cassandra guarantee data availability and consistency across the whole spectrum of possible failure modes, while local hash maps are a big no-no "because Redis is better". Now Consul, that, presumably, does offer stronger consistency guarantees compared to Mongo or Redis, is behind a massive outage. And nobody appears to want to ring a bell; people are just praising each other and the overall spirit of teamwork.

I get it, if Roblox et. al. get down for a few hours every other year, the world would not come to a stop. But it's the same people who design the air traffic control systems, or our healthcare or elections software, or the semi-autonomous weapons that our military is planning to adopt as we speak. Doesn't this concern you at all?

</more_rant>

Okay, maybe it's me whining. But I seriously don't know what big conclusions are right to be drawn from this Roblox post. To me it does read as a poor post where the authors are not only explaining the situation well, but are genuinely clueless about how to build systems of the scale they happened to have built.

I do hope to be wrong; I do hope that at Roblox there are more people, behind the scenes, who have nothing to do with this post, but have a lot to do with helping clean this particular mess and prevent the future ones. But, from my experience, I can assure you that it is not the case generally, even if in Roblox in particular the situation is not as bad as I'm reading between the lines.

Because the hipsters do appear to run the show, at least when it comes to microservices / SOA these days, and the overall system design situation does look like the early stages of Idiocracy to my taste. With the conversation about using Mongo and Redis as a DB and a cache resembling the "it's got electrolytes" scenes more and more.

</rant>
So, in a company I'm helping with the design, we couldn't come up with a memorable name for the core component of our service.

Partly out of desperation, I looked into Sci-Fi for inspiration, and suggested we call it Arrakis. Somehow, the folks liked it.

That was late last week. Now, over the weekend, I'm thinking that the piece that grabs data from all other places and feeds it into our core should be called a [Spice] Harvester.
Regulation-heavy folks, I have a question about how the modern-day data residency rules affect core services such as authorization.

Say, your service is the one that ultimately decides if a user can only view a Google Doc, leave comments there, or edit it directly.

Now, you're a European user, subject to GDPR, of course, and you are finding yourself in the US.

It most certainly is not the case that every single time you attempt to change something in this Google Doc a request goes out to the European server (where "your data" "resides") so that that European server may confirm you are indeed granted that permission?

~ ~ ~

My limited understanding of data residency regulations is that it only affects the storage of PII (personally identifiable information). In other words:

• If you need a cache of permissions for a particular user token, for example, you are absolutely allowed to keep it anywhere, as long as that user token does not contain this user's PII, right?

• If your data is ephemeral (you just store it, in memory, on your own server), and you never respond back with it, you are absolutely allowed to serve the requests that use this data, right?

~ ~ ~

For example, your legal name is PII, so we shouldn't store it anywhere, outside the servers in Europe. Still, if you're opening your Google account, and then its settings, from America, it's an American "frontend server" that is doing the rendering of this page, right?

Thus, once you, as the user, have requested this data, it is allowed, in some form, to pass through a Google data center in America. There might be constraints on not storing this data in any caches, or any web server logs, for instance, or constraints on keeping it encrypted while in transit, but it's not that this data can't cross the country border. Or am I wrong?

~ ~ ~

On the other hand, the tricky case is age. Say, you are a European user, and you have consciously consented to share your age with YouTube in Europe. Now you are in the US, and are about to watch some videos.

Say, the publisher of a video can make it restricted to an arbitrary age. I am making it up, just bear with me. In order to answer the question whether you can watch a certain video, the age restriction on the very video would have to be compared to your age.

Thus, a "malicious" actor can create a hundred videos, 1+, 2+, 3+, ..., 99+, 100+, and then issue requests of the kind of "can this user watch this video?" The authorization service would never disclose the age, but, as all computer scientists understand, it would take seven requests to get to know this user's age down to a year.

~ ~ ~

What does the regulation say here?

If a user from Europe is opening YouTube in the US, on which side of the Atlantic should the very check happen?

For how long can this result be cached?

And is the service liable if, say, a change in important data did not propagate to the other side of the ocean quickly enough, causing the user to mistakenly be allowed to and/or prevented from accessing certain content?
I recall how many years ago Microsoft was made to divorce Internet Explorer from, well, the components of Windows that are integral to the OS. And that was good.

Now -- yes, I do reboot into Windows every now and then -- when I click on the text to expand what that beautiful login screen picture is about ...

... I get to choose between "Edge" and "Edge", with the third option being "Choose from Microsoft Store". And this choice begins with the "search query" of, drum rolls, microsoft-edge.

Just so that we are clear: I have Brave, Firefox, and Chrome installed. And need to open a URL. But Windows would not show me that URL unless I "agree" to use Edge to open it. For real, just a "copy URL" option would work for me -- but it is an "essential OS feature", apparently, to tell me in which exact part of the Indian ocean was that photo taken.

I don't even know what to make of it; I can't even convince myself to dislike Microsoft due to this. It's the rules of the game that companies play by these days. Somehow we're back to square one, where these rules are nowhere near being pro Us, The People, and they are, again, almost entirely pro corporations.

Seriously, in the US, the only "recent "example of an IT regulation I can think of that made life better for an average Joe or Jane was that one can keep their phone number when changing carriers. That's about it.

Back to Microsoft, I don't use macOS, but I do use an iPhone. When that iPhone wants to install its update, I trust Apple to do it. When my Windows 10 "recommends" upgrading to Windows 11, I'm finding myself automatically looking for an option to never offer it again.

And here we are, with the same old company, employing the same old tricks, being in the tops of the world capitalization-wise. It's a really well run company. It cares about its employees diversity and inclusion. It cares about climate. It cares about privacy. The only thing that seems to be missing is caring about these simple things that ruin the experiences of some end users.

I'm going to go all in though and postulate that it's the right strategy for the company. Because the people who truly demand these features from an OS do settle for Linux or for macOS, and are not the target audience of Microsoft Windows. Personally, I wouldn't boot into Windows if my Razer camera would work as well -- 60 FPS Full HD with proper lighting correction and background blur -- on Ubuntu as it does on Windows.
Our next #SystemDesignMeetup episode, must be real good, especially if you are into stream crunching and data processing: comparing #Kafka and #RabbitMQ.

This actually is the first half of what I have to share about the Distributed Task Queue problem. We just decided to split this into two, as there's quite some content on the topic that is not specific to the very problem, but rather talks about message queues used out there in prod.
I just merged in a pull request from dependabot that bumps the versions of some npm modules. And then five more emerged, as if this bot is a hydra. And I've merged in all of them.

This, most likely, is a good thing by itself. But I can smell a lot of possible bad things coming out of this. I mean, this approach is not security through obscurity, but it is creating a lot of room for such in the future.

Just imagine developers all over the world learning to blindly click "merge" on three-lines pull requests to their package-lock.json files. What if a package is hijacked, and updating it results in running malicious code on your production box, or on your home+work laptop? What if a zero-day vulnerability, in node or in npm, or elsewhere, is found, and this kind of automatic pull request is [ab]used to trigger an avalanche of people updating, by accident or through second-order malicious intent?

Finally, what percentage of people would blindly accept a +3 lines change from a user that is not @dependabot, but some seemingly unnoticeable alteration of the name? I'd bet at least a few percent. And how many bad things can be squeezed into three lines? An unbounded number of them. And what are a few percent of developers? 10K+ repositories? 100K+?

If anything, this human factor of potential (and real) over-reliance on semi-automated "improvers" of our code is the strongest argument I have today in favor of using several repositories. Monorepo FTW is my motto in many situations, but, at least from a purely damage control perspective, multiple repos help keep some boundaries tight.

Yesterday I could argue that if some TypeScript code and some C++ code are sharing the same data schema, then it is quite beneficial to keep this code, and the relevant schema, in the same repository.

Today I would think twice before allowing package[-lock].json anywhere in a repository where the main language is C++, and where various pre-submit scripts and post-merge actions do build and run this code on the servers that I care for security-wise.

Or maybe this post just signifies that I've bought into the religion of GitHub actions at scale. Yes, they are quite wasteful, true that. Yes, they often introduce extra complexity. But in our unstable world they do seem to be the safest technique of loose coupling in environments where zero trust is quickly becoming the norm, especially when it comes to popular, widely depended upon, and somewhat fragile open source projects.
For a while, my understanding of what WebAssembly, wasm, is good for fit roughly the following pattern:

• There is an isolated, mostly algorithmic, task.
• This task is best to be solved on the client side, in the browser.
• One could think of writing an npm module that solves this task.
• But an implementation in another language already exists, on the backend (BE).
• So just take this implementation, wrap it into wasm, "ship" the very function to the frontend (BE), and you're done.

One big benefit of this approach is that the FE itself becomes fully decoupled from the implementation of this magic function. There is no dependency on the FE developer, on the build process of the FE repo, or the like. Local iterations are fast and local, just as I like them.

Plus, the tests for the existing code effectively test the wasm-transpiled code as well. In my case, when the code is C++, tested with googletest, and the transpiler is Emscripten (emsdk), everything just works out of the box. I only needed to add a build target so that, among other things, my freshly-built BE also exposes the very transpiled wasm function.

On the team dynamics level, it is awesome that the FE people can focus on making the product look & feel great to the user, while the BE people can do the data heavylifting part. And this way to use wasm does the job indeed.

But the above is Level One of wasm. I'm enlightened now, and can fast track you through the full journey.

Level Two: Ship the "demo" of your SPA with 100% of its BE code served by browser-run wasm.

Chances are, your BE is already implemented in a language that can be transpiled into wasm (node.js, Java, C++, Go, all fit). So, it's a one-weekend hackathon for your BE team to make sure 100% of the necessary code is shipped as wasm, and then another one-weekend hackathon for the FE team to integrate it.

The win of this approach might look minor, but, trust me, it is oh so important for a great demo. It is that your web app becomes extremely, insanely responsive.

Chances are, especially if you are a startup, that you have taken quite a few shortcuts here and there. Extra network round-trips are being made, and/or some operations that can be cleverly combined into synchronous updates (ref. BFF) are now asynchronous. Well, with wasm — assuming the amount of data you are crunching is small, of course — consider these problems non-existent. If all you need is to run to serve the response from a BE is a few quick for-loops, trust me, your wasm engine would likely be able to do it in single-digit milliseconds; indistinguishable from instant by most Web App users these days.

Level Three: Automagically add "offline mode" to any SPA ("single-page [web] app"), with no code change. It is surprisingly easy once this very SPA already contains an initializer for the wasm wrapper. Once it is there, literally all one needs to make an SPA offline-friendly is to:

• Build the whole BE, including the data it needs for the demo, into a single wasm bundle,
• Add the logic to intercept XMLHttpRequest calls during the very SPA initialization of the wasm part, and then
• Whenever the FE code is calling the API, don't make an outgoing HTTP call, but call the now-local wasm function instead.

I have not done this myself; and I would probably not go this far with automating the process. But the idea is just too brilliant for me to not share it.

Who knows, maybe soon we would learn to launch full Docker containers via wasm. There is a fully functioning wasm Linux distro, after all, so the path is paved. And then someone could ship an npm module that would just take a Docker container, wrap it into wasm, and then surgically intercept the calls to the BE to now be served within the browser, probably running this Docker container in a separate Web Worker.

This someone might be me, but I've got more things to build now. So, the idea is up for grabs — feel free to!
The whole story is some 5x longer, but today I had the most fascinating conversation with a road policeman in my life. After about fifteen minutes of communication via Google Translate, it ended the following way:

— You made a turn left where the left turn is prohibited.
— Sir, there were no prohibiting signs along the route I took.
— There was a sign. You made an illegal turn.
— There's no sign there! I'm always careful, Swiss driving school. Why would I violate the rules if there was a sign?
— There was a sign.
— Nah, there was no sign at all. Wanna go together and check?
— Hmm ... how about this: we go there, and IF THERE IS A SIGN YOU PAY DOUBLE THE AMOUNT?
— LET'S GO!

There was no sign. Also, I didn't like how this guy handled his machine gun; I had to keep it next to his body with my own knee while sitting behind on his ATV for it to not point at my calf.
I'm looking into setting up open source reproducible blueprints / performance tests. So that they can be run on the hardware paid by the company, but the code remains open & reusable. Is this a crazy idea?
I was just interviewing a staff level candidate, and, during the reverse interview, they asked me about what levels are there in the company for the staff level to aspire to.

While clarifying the question, it became clear that among what they are looking to explore is the process of standardizing decisions across the company. About microservices, patterns, languages, storages, etc.

The current job title of this candidate is a senior engineer. I happened to know this before the interview because they looked me up on LinkedIn yesterday.

So I gave my most sincere answer:

• As a senior person, you are encouraged to clean up the mess that is affecting your and your team's performance.

• As a staff person, you are expected to prioritize the mess, and even, at times, embrace the mess.

In a way, I was brutally honest, as I like to be. Because the observation I shared holds true in all the companies I was part of. But it's somehow not what the interviewer is proud of, and hence it's not something interviewers would generally share when asked.

Instead, I turned the question around. I shared the big initiatives I am aware or and/or am part of, such as microservice interfaces standardization. I explained why grass roots bottom up initiatives rarely work when it comes to company-wide standards. And I reassured the candidate that, while we are working on reducing the mess where it affects our day duties negatively, they better be fully prepared to experience a certain degree of messiness should they join.

And that they will not be expected to clean up every single mess they encounter. On the contrary, from a staff level person, the company would expect them to exercise their judgement on which messes are best to be taken care of, and which are not worthy of it strategically.

The candidate was happy. So am I!
Frontend and full stack folks, how do you deal with those huge diffs in package-lock.json and with all the */__snapshots__/* stuff?

I'm aware that package-lock.json is recommended to be put under source control. But at least in a separate commit, right?

Maybe there exists a GitHub setting to exclude certain files (or certain paths) from diffs? And from pull request line counters deltas? Or maybe put two pairs of numbers there, +added, -removed, +added_boilerplate, -removed_boilerplate?

Maybe I could put some .gitignore-boilerplate file and the problem would be gone?

Maybe there's a browser extension that would emulate this behavior? (Maybe I should build one or invest in a team that is building one?)

Maybe there's something a frontend developer should know that reduces or eliminates the bloat described above?
So I was doing a "sale call" interview today.

That's the first conversation the candidate has after speaking with the recruiter. That is, the first technical conversation. I am not even expected to provide feedback, and I share this with the candidate first thing, to keep them relaxed.

The objective of the sale call is to make sure the candidate is sufficiently interested in the role, and to gather some info on what teams and roles could they be a good fit for. It is not completely impossible to "fail" this "interview", but, unless the candidate truly wants to bail out, it's hard to imagine a conversation after which the company would want to pass on them.

After a routine past experience chit-chat, this candidate asks me up front: What do you do? And then: What would I do if I join?

Well, he likes designing systems. And his resume has all the relevant bits and pieces. And he has signed the NDA.

So, you're asking what do I do? "Hold my beer", I say figuratively, while closing all confidential tabs. I then zoomed in on that Miro board sufficiently to hide all the people and project names, and screen-shared our design diagram. "Look, here's what we do. That's about what you're looking for, right?"

Guess what happened next?

He criticizes our design!

The candidate, on a sale call that is not even an interview, is telling me that the design we've spent months and months on is suboptimal, and that he knows how to make it better.

Now pause reading for a moment and note your emotion. Those of you who know me well can already predict what would happen next. For the rest of you: enjoy responsibly.

Clearly, I bit the bullet. Agreed with all his comments. Provided a few extra real-life constraints. And one more time, and then one more time. Until he agreed it's not that clear, and that he would probably make the same decisions as we did should he have all the same context.

I thanked him sincerely for the conversation, and assured him I've gathered enough signal about what the best role for him in our company looks like. I've told him I'm putting my feedback straight in, and that the recruiter would be back in touch shortly.

Then I took a deep breath and wrote my feedback. Plus recruiters notes. Along the lines of:

• This is a great candidate.
• We need more people like him.
• I did not interview him technically, but would bet he's strong.
• And we need to do our best to make sure that, provided he passes the tech screen, he receives a great offer and joins us.

In the industry, we need more people who ignore all the rules & conventions, and just open up with what they believe is the point of maximum impact. I love working with them. I hope to be one of them. And, in my experience, this is paying off greatly, both professionally and personally.