DevOps & Architecture | The coregrid.Dev
2 subscribers
13 photos
155 links
B2B Engineering Insights & Architectural Teardowns. Cloud-Native, Highload, DevOps, and unfiltered IT sarcasm. No marketing noise. 🌐 https://thecoregrid.dev
Download Telegram
Why do engineers choose the Elastic Inference Service? 🤔

Elastic recently introduced the Elastic Inference Service (EIS), which integrates GPU-accelerated inference with Elasticsearch. This solution enables fast and scalable processing of vector embeddings and language models, which is critical for modern AI applications. 🚀 Importantly, EIS offloads the infrastructure burden, providing high performance and ease of use through an API.

Of course, such solutions have their trade-offs. For example, dependency on cloud providers and possible regional limitations. But for many teams, the advantages of accelerated processing and ease of integration outweigh these concerns. 🧩

It seems that EIS will become an important tool for those who want to rapidly develop AI applications with minimal infrastructure costs. Has anyone already tried it? Share your experience!
How did Etsy handle migrating 1,000 MySQL shards to Vitess? 🤔

Recently, the Etsy team completed a complex migration of their MySQL sharding infrastructure to Vitess. This allowed them to use vindexes for routing and redistributing data, as well as to simplify working with previously unsharded tables. The main challenge was to integrate the existing sharding logic into Vitess without risk and without the need for a complete data redistribution. 🛠

Why is this important? Implementing Vitess helps eliminate bottlenecks such as a single "index" database and makes it easier for developers to work with sharding by hiding its complexity. However, as with any project, there are trade-offs: the migration process was long and labor-intensive, requiring 2,500 pull requests and 6,000 queries. But the result was worth the effort—a stable and scalable system. 🚀

As the saying goes, "switching to a new system is like changing a car’s wheels while driving." It’s interesting to see how other companies tackle similar challenges. More on InfoQ
Root Cause Analysis as Code in SRE Systems

Root cause analysis (RCA) comes up against scale and the human factor. Meta's approach with DrP demonstrates how to turn debugging into a reproducible engineering process. The problem does not appear immediately—only when the system reaches organizational scale. Incidents...

👉 Read the analysis on the website
Why is distributed infrastructure so important for managed agents? 🤔

According to Akamai, the point is that managed agents require stability and reliability, and distributed infrastructure provides exactly that. It allows you to minimize risks associated with failures and overloads, ensuring constant access to necessary resources. This is especially important for DevOps and SRE, where a missed second can be costly. ⏱️

But there is also a downside: managing such a system becomes more complex. You need to consider many factors to ensure its uninterrupted operation. It's like walking a tightrope: is it worth the effort? It depends on your priorities and resources. 🎪
KV cache optimization for multi-LoRA agents

ForkKV rethinks KV cache optimization for multi-LoRA serving, eliminating memory duplication and increasing throughput. The problem arises in multi-LoRA agent serving, where several specialized agents operate on top of a single base model. LoRA reduces the cost of fine-tuning, but on...

👉 Read the analysis on the website
How to Protect Your GitHub from Threats? 🤔

Engineers often face security threats in their CI/CD systems. One of the key tools is GitHub, which, as a popular platform, attracts the attention of attackers. What should you pay attention to? For example, attacks using npm worms like Shai-Hulud, or OAuth token compromise. These attacks demonstrate how important it is to have reliable detection and response mechanisms.

How to protect yourself? Use static code analysis and vulnerability detection tools such as Datadog SAST or Dependabot. These solutions help track code changes and prevent the execution of malicious scripts. However, as always, increasing security comes at the cost of more complex processes. 🚀

As the saying goes: "The best code is the one that no one has managed to hack." 😉
How does AI help find vulnerabilities? 🤔

In the world of cybersecurity, searching for vulnerabilities is like a game of hide and seek, where vulnerabilities are very well hidden. AI has become a new player in this game, helping to find weak spots faster and more accurately. 🤖 But why is this important for engineers? Because the faster we detect a problem, the less chance attackers have to exploit it.

AI algorithms can process huge volumes of data and identify anomalies that are difficult for humans to notice. However, as with any engineering solution, there are trade-offs here as well: AI can make mistakes and generate false positives. It's like having an assistant who sometimes confuses the server keys with the house keys. 🏠🔑

More here
Agentic systems without context overload

Managing the context window in multi-agent systems determines the quality of reasoning and the stability of investigations. Let's break down how this is addressed through context separation. When an agentic system goes beyond short scenarios, issues may arise. In lengthy investigations, the number of inference requests...

👉 Read the analysis on the website
How does Spotify deliver updates to 675 million users every week? 🎧

How do you like the idea of releasing updates to hundreds of millions of users every week while maintaining stability? No, it's not magic or madness. Spotify has found a balance between speed and safety. 🚀

Spotify uses a release architecture where speed and safety reinforce each other. It all starts with trunk-based development, where code is merged into the main branch immediately after testing and review. This avoids long isolated branches and minimizes integration issues. However, the key here is discipline and reliable automated tests.

The process includes several "rings" of testing: from Spotify employees to 1% of real users. Each ring catches its own category of errors, making releases more robust. Feature flags allow functions to be enabled and disabled without the need for a new release, adding flexibility.

This approach allows Spotify to respond quickly to issues without sacrificing quality. As they say, "fast and careful are not opposites." 🤓

More details here
CPU-free LLM inference without CPU involvement

CPU-free LLM inference changes the critical inference path by eliminating the CPU as a source of latency and instability. Modern LLM serving architectures are more dependent on the CPU than it seems. Although computations are performed on the GPU, it is the CPU that manages...

👉 Read the analysis on the website
How can engineers cope with the growing volumes of data in modern applications? 🤔

This is where observability metrics come to the rescue. They help not only to diagnose and resolve technical issues, but also provide deep insights into business processes. The main types of metrics are application, system, and business indicators. Together, they create a comprehensive view of technology performance. 🚀

But it's not that simple: sorting and correlating data can become a challenge. Using open standards such as OpenTelemetry and automating processes help overcome these difficulties. It is also important to clearly define which metrics are truly important for your goals. 🛠

If you are ready to take a step towards more effective monitoring, Elastic offers solutions for collecting and analyzing metrics.
Why do modern RAG systems so often "stumble" when it comes to processing data in different formats? 🤔

Most Retrieval-Augmented Generation (RAG) systems used in corporate environments have a problem: they handle data in a single format well, but when data is required from different sources—for example, from SQL databases and unstructured documents—complications arise. This leads to incomplete or incorrect answers, which is especially frustrating for analysts who have to manually supplement the data. 💡

A solution could be the use of multimodal agentic systems with hierarchical orchestration, as in the Protocol-H architecture. Here, a "supervisor-worker" topology is used, where the supervisor manages the flow of requests and the workers process tasks in their respective modalities. This helps reduce the number of errors and increase the accuracy of responses. However, this approach requires additional resources for orchestration and can complicate development. 🤹‍♂️

Interesting fact: in Protocol-H tests, the number of "hallucinations" in responses was reduced by 60%! But, as in life, there are no perfect systems, and there is always something to sacrifice for improvement.

More on InfoQ
Low latency systems and communication control

Low latency systems are limited not by the CPU, but by communications. We analyze how architecture reduces latency without sacrificing reliability. The problem does not appear immediately — it becomes noticeable only when the system reaches the limits of network interaction. In...

👉 Read the analysis on the website
How can you compare text and images in a single model? 🤔

With the release of Sentence Transformers v5.4, it has become possible to use multimodal models that combine text, images, audio, and video into a unified vector space. This opens the door to new scenarios, such as image search by text query or building multimodal RAG pipelines. 🎨🔍

Multimodal embedding models convert heterogeneous data into a single vector space, while multimodal rerankers assess their relevance. Yes, the quality is higher, but you pay for it with speed. 🤓 Don't forget about the "modality gap": cross-modal similarities are usually lower than unimodal ones, but the relative order is preserved.

If you don't have a powerful GPU, CLIP models are your best friend. And don't forget about the trade-off: embeddings are for fast search, rerankers are for accuracy. 🖥💡

More details here
Edge-cloud multi-agent with decentralization

The edge-cloud multi-agent architecture shifts the balance between latency and autonomy. The AdecPilot analysis demonstrates how decentralizing control affects system behavior. Modern edge-cloud multi-agent systems for mobile automation face a systemic conflict. On one hand, large models...

👉 Read the analysis on the website
How to Avoid the Fork Trap? 🤔

Have you ever thought about the problems that arise when forking large open-source projects? Meta faced this dilemma when they developed a specialized version of WebRTC for their services. Constant forking can turn into a trap: over time, the resources required to integrate external changes become overwhelming. 🚧

Meta solved the problem by creating a modular architecture based on the latest version of WebRTC. They implemented a "shim"—a layer that allows dynamic switching between WebRTC versions in real time. This solution made it possible to avoid significant increases in binary size and simplified A/B testing. But, as always, there are trade-offs: the need to maintain two stacks simultaneously. 🛠

Meta’s engineering solution is an excellent example of how innovation can help avoid technical debt without a complete system overhaul. What do you think about this approach? Perhaps it will inspire you to solve your own infrastructure challenges.

Read more here
Hive federation for data warehouse without downtime

Hive federation solves the problem of scalability and fault tolerance for data warehouses. Let's look at how Uber moved away from a monolith without stopping analytics. When does a single Hive instance quietly turn into a single point of systemic risk accumulation? In the original architecture, all datasets were...

👉 Read the analysis on the website
Why are some index scans not as fast as they seem? 🧐

Index scans in SQL are often perceived as a fast and efficient way to execute queries. But it's not that simple. At Datadog, they encountered a problem: an index scan on a PostgreSQL table turned out to be slow and costly, despite using the correct index. The reason? The columns in the index were ordered incorrectly relative to the query filters. As a result, we added a targeted index to reduce the average query latency from 300 ms to 38 μs. 🚀

Why does this matter? Understanding how indexes work enables engineers to optimize database performance and avoid unnecessary costs. Datadog now has a feature that automatically detects inefficient index scans—less manual work, more time for important tasks. 💡
FSM Benchmark for Evaluating Network AI Agents

NetAgentBench offers a state-centric approach to evaluating LLMs in network configuration, bridging the gap between static tests and real system behavior. The problem in evaluating AI agents for network configuration does not become apparent immediately — until the moment when...

👉 Read the analysis on the website
How did LinkedIn simplify its recommendation system and what problems did this cause? 🤔

LinkedIn abandoned five separate recommendation systems in favor of a unified model based on an LLM (Large Language Model). This reduced complexity but introduced new challenges. For example, how do you teach an LLM to work with structured data and predict results in 50 milliseconds for 1.3 billion users? And how do you process data when most of it is noise? 🤖

This transformation made it possible to improve the quality of recommendations through a deeper understanding of user interests. However, abandoning specialized systems means losing independent optimizations and natural redundancy. In the event of model regression, for example during the cold start phase, rolling back to a previous version becomes more difficult. 📉

It seems that LinkedIn has bet on simplicity at the expense of resilience. It will be interesting to see how this affects performance in the long term. What do you think? 🤔

More details here