Why do engineers choose the Elastic Inference Service? 🤔
Elastic recently introduced the Elastic Inference Service (EIS), which integrates GPU-accelerated inference with Elasticsearch. This solution enables fast and scalable processing of vector embeddings and language models, which is critical for modern AI applications. 🚀 Importantly, EIS offloads the infrastructure burden, providing high performance and ease of use through an API.
Of course, such solutions have their trade-offs. For example, dependency on cloud providers and possible regional limitations. But for many teams, the advantages of accelerated processing and ease of integration outweigh these concerns. 🧩
It seems that EIS will become an important tool for those who want to rapidly develop AI applications with minimal infrastructure costs. Has anyone already tried it? Share your experience!
Elastic recently introduced the Elastic Inference Service (EIS), which integrates GPU-accelerated inference with Elasticsearch. This solution enables fast and scalable processing of vector embeddings and language models, which is critical for modern AI applications. 🚀 Importantly, EIS offloads the infrastructure burden, providing high performance and ease of use through an API.
Of course, such solutions have their trade-offs. For example, dependency on cloud providers and possible regional limitations. But for many teams, the advantages of accelerated processing and ease of integration outweigh these concerns. 🧩
It seems that EIS will become an important tool for those who want to rapidly develop AI applications with minimal infrastructure costs. Has anyone already tried it? Share your experience!
Elastic Blog
GPUs go brrr! Elastic Inference Service (EIS): GPU-accelerated inference for Elasticsearch
The Elastic Inference Service (EIS), now available on Elastic Cloud, provides GPU-accelerated inference for Elasticsearch to simplify end-to-end semantic search workflows using text embeddings, semantic reranking, and access to LLMs. With EIS, developers…
How did Etsy handle migrating 1,000 MySQL shards to Vitess? 🤔
Recently, the Etsy team completed a complex migration of their MySQL sharding infrastructure to Vitess. This allowed them to use vindexes for routing and redistributing data, as well as to simplify working with previously unsharded tables. The main challenge was to integrate the existing sharding logic into Vitess without risk and without the need for a complete data redistribution. 🛠
Why is this important? Implementing Vitess helps eliminate bottlenecks such as a single "index" database and makes it easier for developers to work with sharding by hiding its complexity. However, as with any project, there are trade-offs: the migration process was long and labor-intensive, requiring 2,500 pull requests and 6,000 queries. But the result was worth the effort—a stable and scalable system. 🚀
As the saying goes, "switching to a new system is like changing a car’s wheels while driving." It’s interesting to see how other companies tackle similar challenges. More on InfoQ
Recently, the Etsy team completed a complex migration of their MySQL sharding infrastructure to Vitess. This allowed them to use vindexes for routing and redistributing data, as well as to simplify working with previously unsharded tables. The main challenge was to integrate the existing sharding logic into Vitess without risk and without the need for a complete data redistribution. 🛠
Why is this important? Implementing Vitess helps eliminate bottlenecks such as a single "index" database and makes it easier for developers to work with sharding by hiding its complexity. However, as with any project, there are trade-offs: the migration process was long and labor-intensive, requiring 2,500 pull requests and 6,000 queries. But the result was worth the effort—a stable and scalable system. 🚀
As the saying goes, "switching to a new system is like changing a car’s wheels while driving." It’s interesting to see how other companies tackle similar challenges. More on InfoQ
InfoQ
Etsy Migrates 1000-Shard, 425 TB MySQL Sharding Architecture to Vitess
The Etsy engineering team recently described how the company migrated its long-running MySQL sharding infrastructure to Vitess. The transition moved shard routing from Etsy’s internal systems to Vitess using vindexes, enabling capabilities such as resharding…
Root Cause Analysis as Code in SRE Systems
Root cause analysis (RCA) comes up against scale and the human factor. Meta's approach with DrP demonstrates how to turn debugging into a reproducible engineering process. The problem does not appear immediately—only when the system reaches organizational scale. Incidents...
👉 Read the analysis on the website
Root cause analysis (RCA) comes up against scale and the human factor. Meta's approach with DrP demonstrates how to turn debugging into a reproducible engineering process. The problem does not appear immediately—only when the system reaches organizational scale. Incidents...
👉 Read the analysis on the website
Why is distributed infrastructure so important for managed agents? 🤔
According to Akamai, the point is that managed agents require stability and reliability, and distributed infrastructure provides exactly that. It allows you to minimize risks associated with failures and overloads, ensuring constant access to necessary resources. This is especially important for DevOps and SRE, where a missed second can be costly. ⏱️
But there is also a downside: managing such a system becomes more complex. You need to consider many factors to ensure its uninterrupted operation. It's like walking a tightrope: is it worth the effort? It depends on your priorities and resources. 🎪
According to Akamai, the point is that managed agents require stability and reliability, and distributed infrastructure provides exactly that. It allows you to minimize risks associated with failures and overloads, ensuring constant access to necessary resources. This is especially important for DevOps and SRE, where a missed second can be costly. ⏱️
But there is also a downside: managing such a system becomes more complex. You need to consider many factors to ensure its uninterrupted operation. It's like walking a tightrope: is it worth the effort? It depends on your priorities and resources. 🎪
Akamai
Why Managed Agents Needs Distributed Infrastructure | Akamai
Anthropic’s Claude Managed Agents requires distributed cloud and security. See how Akamai Inference Cloud and API Security power and protect autonomous AI agents.
KV cache optimization for multi-LoRA agents
ForkKV rethinks KV cache optimization for multi-LoRA serving, eliminating memory duplication and increasing throughput. The problem arises in multi-LoRA agent serving, where several specialized agents operate on top of a single base model. LoRA reduces the cost of fine-tuning, but on...
👉 Read the analysis on the website
ForkKV rethinks KV cache optimization for multi-LoRA serving, eliminating memory duplication and increasing throughput. The problem arises in multi-LoRA agent serving, where several specialized agents operate on top of a single base model. LoRA reduces the cost of fine-tuning, but on...
👉 Read the analysis on the website
How to Protect Your GitHub from Threats? 🤔
Engineers often face security threats in their CI/CD systems. One of the key tools is GitHub, which, as a popular platform, attracts the attention of attackers. What should you pay attention to? For example, attacks using npm worms like Shai-Hulud, or OAuth token compromise. These attacks demonstrate how important it is to have reliable detection and response mechanisms.
How to protect yourself? Use static code analysis and vulnerability detection tools such as Datadog SAST or Dependabot. These solutions help track code changes and prevent the execution of malicious scripts. However, as always, increasing security comes at the cost of more complex processes. 🚀
As the saying goes: "The best code is the one that no one has managed to hack." 😉
Engineers often face security threats in their CI/CD systems. One of the key tools is GitHub, which, as a popular platform, attracts the attention of attackers. What should you pay attention to? For example, attacks using npm worms like Shai-Hulud, or OAuth token compromise. These attacks demonstrate how important it is to have reliable detection and response mechanisms.
How to protect yourself? Use static code analysis and vulnerability detection tools such as Datadog SAST or Dependabot. These solutions help track code changes and prevent the execution of malicious scripts. However, as always, increasing security comes at the cost of more complex processes. 🚀
As the saying goes: "The best code is the one that no one has managed to hack." 😉
Datadog
CI/CD security: How to secure your GitHub ecosystem | Datadog
Learn how to apply a detection-based threat model to secure your GitHub ecosystem by identifying key inputs, identities, and their associated risks.
How does AI help find vulnerabilities? 🤔
In the world of cybersecurity, searching for vulnerabilities is like a game of hide and seek, where vulnerabilities are very well hidden. AI has become a new player in this game, helping to find weak spots faster and more accurately. 🤖 But why is this important for engineers? Because the faster we detect a problem, the less chance attackers have to exploit it.
AI algorithms can process huge volumes of data and identify anomalies that are difficult for humans to notice. However, as with any engineering solution, there are trade-offs here as well: AI can make mistakes and generate false positives. It's like having an assistant who sometimes confuses the server keys with the house keys. 🏠🔑
More here
In the world of cybersecurity, searching for vulnerabilities is like a game of hide and seek, where vulnerabilities are very well hidden. AI has become a new player in this game, helping to find weak spots faster and more accurately. 🤖 But why is this important for engineers? Because the faster we detect a problem, the less chance attackers have to exploit it.
AI algorithms can process huge volumes of data and identify anomalies that are difficult for humans to notice. However, as with any engineering solution, there are trade-offs here as well: AI can make mistakes and generate false positives. It's like having an assistant who sometimes confuses the server keys with the house keys. 🏠🔑
More here
Akamai
Why AI-Powered Vulnerability Discovery Strengthens Akamai's Security Mission | Akamai
Read about the implications of Project Glasswing and the Claude Mythos Preview — and learn how Akamai can help navigate the resulting new security landscape.
Agentic systems without context overload
Managing the context window in multi-agent systems determines the quality of reasoning and the stability of investigations. Let's break down how this is addressed through context separation. When an agentic system goes beyond short scenarios, issues may arise. In lengthy investigations, the number of inference requests...
👉 Read the analysis on the website
Managing the context window in multi-agent systems determines the quality of reasoning and the stability of investigations. Let's break down how this is addressed through context separation. When an agentic system goes beyond short scenarios, issues may arise. In lengthy investigations, the number of inference requests...
👉 Read the analysis on the website
How does Spotify deliver updates to 675 million users every week? 🎧
How do you like the idea of releasing updates to hundreds of millions of users every week while maintaining stability? No, it's not magic or madness. Spotify has found a balance between speed and safety. 🚀
Spotify uses a release architecture where speed and safety reinforce each other. It all starts with trunk-based development, where code is merged into the main branch immediately after testing and review. This avoids long isolated branches and minimizes integration issues. However, the key here is discipline and reliable automated tests.
The process includes several "rings" of testing: from Spotify employees to 1% of real users. Each ring catches its own category of errors, making releases more robust. Feature flags allow functions to be enabled and disabled without the need for a new release, adding flexibility.
This approach allows Spotify to respond quickly to issues without sacrificing quality. As they say, "fast and careful are not opposites." 🤓
More details here
How do you like the idea of releasing updates to hundreds of millions of users every week while maintaining stability? No, it's not magic or madness. Spotify has found a balance between speed and safety. 🚀
Spotify uses a release architecture where speed and safety reinforce each other. It all starts with trunk-based development, where code is merged into the main branch immediately after testing and review. This avoids long isolated branches and minimizes integration issues. However, the key here is discipline and reliable automated tests.
The process includes several "rings" of testing: from Spotify employees to 1% of real users. Each ring catches its own category of errors, making releases more robust. Feature flags allow functions to be enabled and disabled without the need for a new release, adding flexibility.
This approach allows Spotify to respond quickly to issues without sacrificing quality. As they say, "fast and careful are not opposites." 🤓
More details here
Bytebytego
How Spotify Ships to 675 Million Users Every Week Without Breaking Things
In this article, we will take a look at this process in detail and attempt to derive learnings.
CPU-free LLM inference without CPU involvement
CPU-free LLM inference changes the critical inference path by eliminating the CPU as a source of latency and instability. Modern LLM serving architectures are more dependent on the CPU than it seems. Although computations are performed on the GPU, it is the CPU that manages...
👉 Read the analysis on the website
CPU-free LLM inference changes the critical inference path by eliminating the CPU as a source of latency and instability. Modern LLM serving architectures are more dependent on the CPU than it seems. Although computations are performed on the GPU, it is the CPU that manages...
👉 Read the analysis on the website
How can engineers cope with the growing volumes of data in modern applications? 🤔
This is where observability metrics come to the rescue. They help not only to diagnose and resolve technical issues, but also provide deep insights into business processes. The main types of metrics are application, system, and business indicators. Together, they create a comprehensive view of technology performance. 🚀
But it's not that simple: sorting and correlating data can become a challenge. Using open standards such as OpenTelemetry and automating processes help overcome these difficulties. It is also important to clearly define which metrics are truly important for your goals. 🛠
If you are ready to take a step towards more effective monitoring, Elastic offers solutions for collecting and analyzing metrics.
This is where observability metrics come to the rescue. They help not only to diagnose and resolve technical issues, but also provide deep insights into business processes. The main types of metrics are application, system, and business indicators. Together, they create a comprehensive view of technology performance. 🚀
But it's not that simple: sorting and correlating data can become a challenge. Using open standards such as OpenTelemetry and automating processes help overcome these difficulties. It is also important to clearly define which metrics are truly important for your goals. 🛠
If you are ready to take a step towards more effective monitoring, Elastic offers solutions for collecting and analyzing metrics.
Elastic Blog
Understanding observability metrics: Types, golden signals, and best practices
Learn how observability metrics, logs, traces, and profiles enhance monitoring, optimize performance, and support data-driven decisions.
Why do modern RAG systems so often "stumble" when it comes to processing data in different formats? 🤔
Most Retrieval-Augmented Generation (RAG) systems used in corporate environments have a problem: they handle data in a single format well, but when data is required from different sources—for example, from SQL databases and unstructured documents—complications arise. This leads to incomplete or incorrect answers, which is especially frustrating for analysts who have to manually supplement the data. 💡
A solution could be the use of multimodal agentic systems with hierarchical orchestration, as in the Protocol-H architecture. Here, a "supervisor-worker" topology is used, where the supervisor manages the flow of requests and the workers process tasks in their respective modalities. This helps reduce the number of errors and increase the accuracy of responses. However, this approach requires additional resources for orchestration and can complicate development. 🤹♂️
Interesting fact: in Protocol-H tests, the number of "hallucinations" in responses was reduced by 60%! But, as in life, there are no perfect systems, and there is always something to sacrifice for improvement.
More on InfoQ
Most Retrieval-Augmented Generation (RAG) systems used in corporate environments have a problem: they handle data in a single format well, but when data is required from different sources—for example, from SQL databases and unstructured documents—complications arise. This leads to incomplete or incorrect answers, which is especially frustrating for analysts who have to manually supplement the data. 💡
A solution could be the use of multimodal agentic systems with hierarchical orchestration, as in the Protocol-H architecture. Here, a "supervisor-worker" topology is used, where the supervisor manages the flow of requests and the workers process tasks in their respective modalities. This helps reduce the number of errors and increase the accuracy of responses. However, this approach requires additional resources for orchestration and can complicate development. 🤹♂️
Interesting fact: in Protocol-H tests, the number of "hallucinations" in responses was reduced by 60%! But, as in life, there are no perfect systems, and there is always something to sacrifice for improvement.
More on InfoQ
InfoQ
Building Hierarchical Agentic RAG Systems: Multi-Modal Reasoning with Autonomous Error Recovery
In this article, the author explores how hierarchical agentic RAG systems coordinate specialized workers through structured orchestration to improve accuracy, reliability, and explainability in complex enterprise analytics workflows. The article uses Protocol…
Low latency systems and communication control
Low latency systems are limited not by the CPU, but by communications. We analyze how architecture reduces latency without sacrificing reliability. The problem does not appear immediately — it becomes noticeable only when the system reaches the limits of network interaction. In...
👉 Read the analysis on the website
Low latency systems are limited not by the CPU, but by communications. We analyze how architecture reduces latency without sacrificing reliability. The problem does not appear immediately — it becomes noticeable only when the system reaches the limits of network interaction. In...
👉 Read the analysis on the website
How can you compare text and images in a single model? 🤔
With the release of Sentence Transformers v5.4, it has become possible to use multimodal models that combine text, images, audio, and video into a unified vector space. This opens the door to new scenarios, such as image search by text query or building multimodal RAG pipelines. 🎨🔍
Multimodal embedding models convert heterogeneous data into a single vector space, while multimodal rerankers assess their relevance. Yes, the quality is higher, but you pay for it with speed. 🤓 Don't forget about the "modality gap": cross-modal similarities are usually lower than unimodal ones, but the relative order is preserved.
If you don't have a powerful GPU, CLIP models are your best friend. And don't forget about the trade-off: embeddings are for fast search, rerankers are for accuracy. 🖥💡
More details here
With the release of Sentence Transformers v5.4, it has become possible to use multimodal models that combine text, images, audio, and video into a unified vector space. This opens the door to new scenarios, such as image search by text query or building multimodal RAG pipelines. 🎨🔍
Multimodal embedding models convert heterogeneous data into a single vector space, while multimodal rerankers assess their relevance. Yes, the quality is higher, but you pay for it with speed. 🤓 Don't forget about the "modality gap": cross-modal similarities are usually lower than unimodal ones, but the relative order is preserved.
If you don't have a powerful GPU, CLIP models are your best friend. And don't forget about the trade-off: embeddings are for fast search, rerankers are for accuracy. 🖥💡
More details here
huggingface.co
Multimodal Embedding & Reranker Models with Sentence Transformers
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Edge-cloud multi-agent with decentralization
The edge-cloud multi-agent architecture shifts the balance between latency and autonomy. The AdecPilot analysis demonstrates how decentralizing control affects system behavior. Modern edge-cloud multi-agent systems for mobile automation face a systemic conflict. On one hand, large models...
👉 Read the analysis on the website
The edge-cloud multi-agent architecture shifts the balance between latency and autonomy. The AdecPilot analysis demonstrates how decentralizing control affects system behavior. Modern edge-cloud multi-agent systems for mobile automation face a systemic conflict. On one hand, large models...
👉 Read the analysis on the website
The coregrid.Dev
Edge-cloud multi-agent with decentralization - ⟦ The coregrid.Dev
Edge-cloud multi-agent architecture with decentralized management: how to reduce latency, traffic, and enhance resilience in mobile automation. -->
How to Avoid the Fork Trap? 🤔
Have you ever thought about the problems that arise when forking large open-source projects? Meta faced this dilemma when they developed a specialized version of WebRTC for their services. Constant forking can turn into a trap: over time, the resources required to integrate external changes become overwhelming. 🚧
Meta solved the problem by creating a modular architecture based on the latest version of WebRTC. They implemented a "shim"—a layer that allows dynamic switching between WebRTC versions in real time. This solution made it possible to avoid significant increases in binary size and simplified A/B testing. But, as always, there are trade-offs: the need to maintain two stacks simultaneously. 🛠
Meta’s engineering solution is an excellent example of how innovation can help avoid technical debt without a complete system overhaul. What do you think about this approach? Perhaps it will inspire you to solve your own infrastructure challenges.
Read more here
Have you ever thought about the problems that arise when forking large open-source projects? Meta faced this dilemma when they developed a specialized version of WebRTC for their services. Constant forking can turn into a trap: over time, the resources required to integrate external changes become overwhelming. 🚧
Meta solved the problem by creating a modular architecture based on the latest version of WebRTC. They implemented a "shim"—a layer that allows dynamic switching between WebRTC versions in real time. This solution made it possible to avoid significant increases in binary size and simplified A/B testing. But, as always, there are trade-offs: the need to maintain two stacks simultaneously. 🛠
Meta’s engineering solution is an excellent example of how innovation can help avoid technical debt without a complete system overhaul. What do you think about this approach? Perhaps it will inspire you to solve your own infrastructure challenges.
Read more here
Engineering at Meta
Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases
At Meta, WebRTC powers real-time audio and video across various platforms. But forking a large open-source project like WebRTC within our monorepo presents unique challenges – over time, an interna…
Hive federation for data warehouse without downtime
Hive federation solves the problem of scalability and fault tolerance for data warehouses. Let's look at how Uber moved away from a monolith without stopping analytics. When does a single Hive instance quietly turn into a single point of systemic risk accumulation? In the original architecture, all datasets were...
👉 Read the analysis on the website
Hive federation solves the problem of scalability and fault tolerance for data warehouses. Let's look at how Uber moved away from a monolith without stopping analytics. When does a single Hive instance quietly turn into a single point of systemic risk accumulation? In the original architecture, all datasets were...
👉 Read the analysis on the website
Why are some index scans not as fast as they seem? 🧐
Index scans in SQL are often perceived as a fast and efficient way to execute queries. But it's not that simple. At Datadog, they encountered a problem: an index scan on a PostgreSQL table turned out to be slow and costly, despite using the correct index. The reason? The columns in the index were ordered incorrectly relative to the query filters. As a result, we added a targeted index to reduce the average query latency from 300 ms to 38 μs. 🚀
Why does this matter? Understanding how indexes work enables engineers to optimize database performance and avoid unnecessary costs. Datadog now has a feature that automatically detects inefficient index scans—less manual work, more time for important tasks. 💡
Index scans in SQL are often perceived as a fast and efficient way to execute queries. But it's not that simple. At Datadog, they encountered a problem: an index scan on a PostgreSQL table turned out to be slow and costly, despite using the correct index. The reason? The columns in the index were ordered incorrectly relative to the query filters. As a result, we added a targeted index to reduce the average query latency from 300 ms to 38 μs. 🚀
Why does this matter? Understanding how indexes work enables engineers to optimize database performance and avoid unnecessary costs. Datadog now has a feature that automatically detects inefficient index scans—less manual work, more time for important tasks. 💡
Datadog
Not all index scans are equal: How we cut query latency by over 99% | Datadog
Just because your query uses an index scan doesn't mean it's fast or performant. Learn how misaligned predicates and column order hurt index scan performance and how to detect this pattern using DBM.
FSM Benchmark for Evaluating Network AI Agents
NetAgentBench offers a state-centric approach to evaluating LLMs in network configuration, bridging the gap between static tests and real system behavior. The problem in evaluating AI agents for network configuration does not become apparent immediately — until the moment when...
👉 Read the analysis on the website
NetAgentBench offers a state-centric approach to evaluating LLMs in network configuration, bridging the gap between static tests and real system behavior. The problem in evaluating AI agents for network configuration does not become apparent immediately — until the moment when...
👉 Read the analysis on the website
The coregrid.Dev
FSM Benchmark for Evaluating Network AI Agents - ⟦ The coregrid.Dev
FSM benchmark network configuration: how NetAgentBench reveals failures of LLM agents in dynamic network scenarios and multi-turn behavior.
How did LinkedIn simplify its recommendation system and what problems did this cause? 🤔
LinkedIn abandoned five separate recommendation systems in favor of a unified model based on an LLM (Large Language Model). This reduced complexity but introduced new challenges. For example, how do you teach an LLM to work with structured data and predict results in 50 milliseconds for 1.3 billion users? And how do you process data when most of it is noise? 🤖
This transformation made it possible to improve the quality of recommendations through a deeper understanding of user interests. However, abandoning specialized systems means losing independent optimizations and natural redundancy. In the event of model regression, for example during the cold start phase, rolling back to a previous version becomes more difficult. 📉
It seems that LinkedIn has bet on simplicity at the expense of resilience. It will be interesting to see how this affects performance in the long term. What do you think? 🤔
More details here
LinkedIn abandoned five separate recommendation systems in favor of a unified model based on an LLM (Large Language Model). This reduced complexity but introduced new challenges. For example, how do you teach an LLM to work with structured data and predict results in 50 milliseconds for 1.3 billion users? And how do you process data when most of it is noise? 🤖
This transformation made it possible to improve the quality of recommendations through a deeper understanding of user interests. However, abandoning specialized systems means losing independent optimizations and natural redundancy. In the event of model regression, for example during the cold start phase, rolling back to a previous version becomes more difficult. 📉
It seems that LinkedIn has bet on simplicity at the expense of resilience. It will be interesting to see how this affects performance in the long term. What do you think? 🤔
More details here
Bytebytego
How LinkedIn Feed Uses LLMs to Serve 1.3 Billion Users
In this article, we will look at how the LinkedIn engineering team rebuilt the Feed and the challenges they faced.