LINUX &&|| PROGRAMMING

GenAI for Legacy Systems Modernization

While most people actively write about using GenAI tools to generate new code, there is a new Thoughtworks publication that focuses on the opposite — using AI to understand and refactor legacy systems.

What makes legacy systems modernization expensive?
- Lack of design and implementation details knowledge
- Lack of actual documentation
- Lack of automated tests
- Absence of human experts
- Difficulty to measure the impact of the change

To address these challenges Thoughtworks team developed a tool called CodeConcise. But the authors highlighted that you don't need exactly this tool, the approach and ideas can be used as a reference to implement your own solution.

Key concepts:
✏️ Treat code as data
✏️ Build Abstract Syntax Trees (ASTs) to identify entities and relationships in the code
✏️ Store these ASTs in graph database (neo4j)
✏️ Use a comprehension pipeline that traverses the graph using multiple algorithms, such as Depth-first Search with backtracking in post-order traversal, to enrich the graph with LLM-generated explanations at various depths (e.g. methods, classes, packages)
✏️ Integrate the enriched graph with a frontend application that implements Retrieval-Augmented Generation (RAG) approach
✏️ The RAG retrieval component pulls nodes relevant to the user’s prompt, while the LLM further traverses the graph to gather more information from their neighboring nodes to provide the LLM-generated explanations at various levels of abstraction
✏️ The same enrichment pipeline can be used to generate documentation for the existing system

For now the tool was tested with several clients to generate explanations for low-level legacy code. The next goal is to improve the model to provide answers at the higher level of abstraction, keeping in mind that it might not be directly possible by examining the code alone.

The work looks promising and could significantly reduce the time and cost of modernizing old systems (especially written on exotic languages like COBOL). It simplifies reverse-engineering and helps generate knowledge about the current system. The authors also promised to share results on improving the current model and provide more real life examples for the tool usage.

#news #engineering #ai

martinfowler.com

Legacy Modernization meets GenAI

Lessons from building and using a GenAI tool to assist legacy modernization.

70 views22:45

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Software Complexity

Have you ever seen a project turned into a monster over time? Hard to understand, difficult to maintain? If so, I highly recommend Peter van Hardenberg’s talk - Why Can't We Make Simple Software?

The author explains what complexity is (it's not the same as difficulty!), why software gets so complicated, and what we can actually do about it.

Common reasons for complex software:
✏️ Defensive Code. Code that starts simple with implementing some sunny day scenario but grows over as more edge cases are handled. Over time, it turns into a mess with too many execution paths.
✏️ Scaling. A system designed for 100 users is really different from one built for 10 million. Handling scale often adds layers of complexity.
✏️ Leaky Abstractions. A well-designed interface should hide complexity, not expose unnecessary details. (A good discussion on this is in Build Abstractions not Illusions post).
✏️ Gap Between Model and Reality. If a software model isn't actually mapped to the problem domain, it leads to growing system complexity that really hard to fix.
✏️ Hyperspace. Problem can multiply when a system has to work across many dimensions—different browsers, mobile platforms, OS versions, screen sizes, and more.

The software architecture degrades over time with the changes made. Every change can introduce more complexity, so it’s critical to keep things simple. Some strategies to do that:
✏️ Start Over. Rebuild everything from scratch. Sometimes, it is the only way forward if the existing architecture can't support new business requirements.
✏️ Eliminate Dependencies. Less dependencies the system has, the easier it is to predict system behavior and make impact analysis.
✏️ Reduce Scope. Build only what you actually need now. Avoid premature optimizations and "nice-to-have" features for some hypothetical future.
✏️ Simplify Architecture. No comments 😃
✏️ Avoid N-to-M Complexity. Reduce unnecessary variability to limit testing scope and system interactions.

Complexity starts when interactions appear. So it is about dynamic system behavior. Watching this talk made me reflect on why systems become so complex and how I can make better design decisions.

#architecture #engineering

YouTube

Why Can't We Make Simple Software? - Peter van Hardenberg

Find out more about Handmade Cities at: https://handmadecities.com/

Discover meetups in your area: https://handmadecities.com/meetups

Watch previous talks, demos, and more anytime at: https://handmadecities.com/media

Chapters:
0:00 Intro
1:40 Chapter…

81 views19:03

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Hashicorp Plugin Ecosystem

When Go didn't have a plugin package, Hashicorp implemented their own plugin architecture. The main difference from other plugin systems is that it works over RPC. At first, that might sound a bit unusual, but the approach shows really good results and it is actively used in many popular products like Hashicorp Valut, Terraform, Nomad, Velero.

Key concepts:
✏️ Plugin is a binary that runs an RPC (or gRPC) server.
✏️ A main application loads plugins from a specified directory and runs them as OS child processes.
✏️ A single connection is made between each plugin and the host process.
✏️ The connection is bidirectional, so plugin can also call application APIs.
✏️ Plugin and the application itself must be on the same host and use local network only, no remote calls are allowed.
✏️ Each plugin provides a protocol version that can be used as its API version.
✏️ A special handshake is used to establish a connection. The plugin writes its protocol version, network type, address and protocol to stdout, and the main app uses this information to connect.

Benefits of the approach:
✏️ Plugins can't crash the main process
✏️ Plugins can be written in different languages
✏️ Easy installation - just put a binary into the folder
✏️ Stdout/Stderr Syncing. While plugins are subprocesses, they can continue to use stdout/stderr as usual and their output will get mirrored to the host app.
✏️ Host upgrade while a plugin is running. Plugins can be "reattached" so that the host process can be upgraded while the plugin is still running.
✏️ Plugins are secure: Plugins have access only to the interfaces and args given to it, not to the entire memory space of the process.

In cloud ecosystem, plugins can be delivered as init containers. During startup, the plugin binary from the init container is copied into the main app container.

If you're designing some pluggable architecture, Hashicorp RPC Plugins is definitely the approach to look at.

#systemdesign #engineering

54 views09:14

LINUX &&|| PROGRAMMING

Dokumentacja jako kod

Jestem naprawdę przekonany, że dokumentacja jest częścią aplikacji . Powinna być rozwijana, aktualizowana i sprawdzana przy użyciu tych samych procesów i narzędzi, co kod aplikacji.

Jeśli dokumentacja jest przechowywana gdzie indziej, np. na oddzielnej wiki , to w zasadzie przestaje być aktualna w ciągu 5 minut od opublikowania .

Oznacza to, że dokumentacja powinna znajdować się w repozytorium git. Jeśli jakieś zachowanie systemu zostanie zmienione podczas naprawiania błędów lub opracowywania nowej funkcji, odpowiednia dokumentacja powinna zostać zaktualizowana w tym samym PR. Takie podejście pomaga utrzymać dokumentację na bieżąco .

To naprawdę proste, jeśli używasz monorepo . Wszystkie dokumenty i kod są umieszczone w jednym miejscu, więc łatwo znaleźć to, czego potrzebujesz. Rzeczy stają się bardziej skomplikowane, jeśli masz wiele mikrorepo . Nawet jeśli dokumenty są aktualne, użytkownikom trudno je znaleźć. Zwykle rozwiązuje się to, publikując dokumenty w centralnym portalu jako część procesu CI lub obecnie, korzystając z bota AI, który pomaga.

Niedawno Pinterest opublikował artykuł o tym, jak przyjęli podejście dokumentowania jako kodu . Ponieważ używają mikrorepozytoriów, głównym wyzwaniem było uczynienie dokumentacji możliwą do odnalezienia dla ich użytkowników w setkach repozytoriów.

Co zrobili:
🔸 Przeniesiono ich dokumenty do repozytoriów git, korzystając z języka znaczników Markdown.
🔸 Wykorzystano MkDocs w CI do wygenerowania wersji HTML dokumentów.
🔸 Utworzono centralne miejsce do przechowywania i indeksowania dokumentów o nazwie PDocs (Pinterest Docs).
🔸 Zintegrowana dokumentacja z GenAI — botem AI połączonym z głównymi kanałami komunikacji firmy.
🔸 Zbudowałem narzędzie umożliwiające migrację starych stron wiki do gita za pomocą jednego kliknięcia.

Nie znam żadnego standardowego rozwiązania dla agregacji dokumentów w wielu repozytoriach, więc byłoby wspaniale, gdyby Pinterest w przyszłości udostępnił swoje PDocs jako open-source. Myślę, że mogłoby to naprawdę pomóc wielu zespołom w ulepszeniu procesów dokumentowania.

#engineering #documentation
👇🏻👇🏻👇🏻👇🏻👇🏻

39 views08:40

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Documentation As a Code

I have a really strong opinion that the documentation is part of the application. It should be developed, updated and reviewed using the same processes and tools as the application code.

If the documentation is stored somewhere else, like in a separate wiki, it's basically dead within 5 minutes after it's published.

This means documentation should live in the git repo. If some system behavior is changed during bugfixing or a new feature development, the relevant documentation should be updated in the same PR. This approach helps to keep documentation up to date.

It's really simple if you use a monorepo. All docs and code are placed in one place, so it's easy to find what you need. Things become more complicated if you have lots of microrepos. Even if docs are up to date, it's quite hard for users to find them. Usually, this is solved by publishing docs to a central portal as part of the CI process, or nowadays by using an AI bot to help.

Recently, Pinterest published an article about how they adopted the documentation-as-code approach. Since they use microrepos, the main challenge was to make documentation discoverable for their users across hundreds of repos.

What they did:
🔸 Moved their docs to git repos using markdown.
🔸 Used MkDocs in CI to generate HTML versions of the docs.
🔸 Created a central place to host and index docs called PDocs (Pinterest Docs).
🔸 Integrated docs with GenAI — an AI bot connected to the main company communication channels.
🔸 Built a one-click tool to migrate old wiki pages to git.

I don’t know any standard solution for doc aggregation across multiple repos, so it would be great if Pinterest open-sourced their PDocs in the future. I think it could really help a lot of teams to improve their documentation processes.

#engineering #documentation

Medium

Adopting Docs-as-Code at Pinterest

Jacob Seiler | Software Engineer, Internal Tools Platform
Jay Kim | Software Engineer, Internal Tools Platform
Charlie Gu | Engineering…

47 views08:40

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Measuring System Complexity

I think we can all agree that the less complex our systems are, the easier they are to modify, operate and troubleshoot. But how can we properly measure complexity?

The most popular answer will be something related to cyclomatic complexity or number of code lines. But have you ever tried to use them in practice? I found them absolutely impractical and not actionable for huge codebases. They will always show you some numbers detecting the system is big and complex. Nothing new actually 🙃

I found more practical alternatives in Google SRE book:
🔸 Training Time: Time to onboard a new team member to the team.
🔸 Explanation Time: Time to explain high-level architecture of the service.
🔸 Administrative Diversity: Number of ways to configure similar settings in different parts of the system.
🔸 Diversity of Deployed Configurations: Number of configurations that are deployed in production. It can include installed services, their versions, feature flags, environment-specific parameters.
🔸 Age of the System: The older system tends to be more complex and fragile.

Of course, these metrices are not mathematically precise, but they provide high level indicators of the overall complexity of the existing architecture, not just individual blocks of code. And most importantly, they show what direction we should take to improve the situation.

#engineering #systemdesign

❤1

38 views19:56

LINUX &&|| PROGRAMMING

Czy potrafisz dostrzec cały las za drzewami?

Jak często widziałeś programistów, którzy utknęli w recenzji kodu, omawiając optymalizację metod lub „doskonałość kodu”? Spędzają godziny lub nawet dni, próbując uczynić kod doskonałym? Czy to naprawdę pomogło w stworzeniu dobrze zaprojektowanego rozwiązania?

Programiści często utkną w drobnych szczegółach i całkowicie tracą z oczu ogólny obraz. To bardzo częsty błąd, który widzę w zespołach inżynieryjnych. W rezultacie mamy doskonałe klasy lub funkcje i kompletny bałagan w ogólnej strukturze.

Właściwa recenzja powinna zawsze zaczynać się od ogólnego spojrzenia:
🔸 Struktura komponentów: Czy zmiany są wdrożone w komponentach/usługach, które są faktycznie odpowiedzialne za tę logikę?
🔸 Struktura modułów: Czy zmiany w modułach są takie, jakich się spodziewałeś (publiczne vs prywatne, pkg vs wewnętrzne itd.)?
🔸 Kontrakty publiczne: Sprawdź, jak Twoje API będą wykorzystywane przez inne aplikacje. Czy są one jasne, wygodne, łatwe w użyciu i łatwe do rozszerzenia?
🔸 Nazewnictwo: Czy nazwy modułów, klas i funkcji są jasne i łatwe do zrozumienia? Czy nie dublują istniejących podmiotów?
🔸 Model danych: Czy model jest prawidłowo modelowany? Czy model postępuje zgodnie z zasadą pojedynczej odpowiedzialności?
🔸 Testy: Czy główne przypadki są uwzględnione? A co z negatywnymi scenariuszami? Czy mamy właściwe podejście do obsługi błędów?

W większości przypadków nie ma sensu sprawdzać szczegółów kodu, dopóki powyższe elementy nie zostaną sfinalizowane. Kod prawdopodobnie zostanie przepisany, być może nawet więcej niż raz.
Dlatego konkretne linie kodu powinny być ostatnią rzeczą do sprawdzenia.

Szczegóły są łatwe do naprawienia.
Struktura i kontrakty nie są.

#engineering #codereview

31 viewsedited 05:58

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Can you see the forest for the trees?

How often have you seen developers stuck in code review discussing some method optimization or "code excellence"? Spending hours or even days trying to make code perfect? Did that really help build a well-architected solution?

Developers often get stuck in small details and completely lose sight of the bigger picture. That's a very common mistake that I see in engineering teams. As a result, there are perfect classes or functions and complete mess in overall structure.

A proper review should always start with a bird’s-eye view:
🔸 Component structure: Are changes implemented in the components\services that are actually responsible for this logic?
🔸 Module structure: Are changes in the modules you expect them to be (public vs private, pkg vs internal, etc.)?
🔸 Public contracts: Review how your APIs will be used by other parties. Are they clear, convenient, easy to use, and easy to extend?
🔸 Naming: Are module, class and function names clear and easy to understand? Don't they duplicate existing entities?
🔸 Data model: Is the domain modeled correctly? Does the model follow single responsibility principle?
🔸 Testing: Are main cases covered? What about negative scenarios? Do we have proper failure handling approach?

In most cases, there’s no point in reviewing code details until the items above are finalized. The code will likely be rewritten, maybe even more than once.
That’s why specific lines of code should be the last thing to check.

Details are cheap to fix.
Structure and contracts are not.

#engineering #codereview

32 views05:58

LINUX &&|| PROGRAMMING

Forwarded from TechLead Bits

Scale Cube

In IT, we work in a very complex domain. We have to keep a lot of things in our heads at once: technologies, patterns, trade-offs, limitations.
That's why I like simple models that help me stay focused and remember technical concepts.

One such model is Scale Cube. This is the model introduced in 2009 in the book "The art of scalability" and it suggests 3 dimensions for scaling:
🔸 Horizontal scaling: duplicate similar things, clone data, add more replicas.
🔸 Functional decomposition: split application to multiple services.
🔸 Sharding: split data into subsets by region, tenant, hash, or range.

The starting point is always monolith.
The end point is near-infinite scale with all 3 dimensions implemented together (see diagram in the post).

That's it. It's very simple and powerful. If you want to scale something you have only 3 strategies to do that 😎. So you don’t need to waste time reinventing the wheel.

#architecture #engineering #scalability

38 views09:34

LINUX &&|| PROGRAMMING

TechLead Bits

Scale Cube In IT, we work in a very complex domain. We have to keep a lot of things in our heads at once: technologies, patterns, trade-offs, limitations. That's why I like simple models that help me stay focused and remember technical concepts. One such…

Kostka skalowania

W #IT pracujemy w bardzo złożonej dziedzinie. Musimy pamiętać o wielu rzeczach jednocześnie: technologiach, wzorcach, kompromisach, ograniczeniach.
Dlatego lubię proste modele, które pomagają mi się skupić i zapamiętać koncepcje techniczne.

Jednym z takich modeli jest Kostka skalowania. Model ten został wprowadzony w 2009 roku w książce „Sztuka skalowalności” i proponuje 3 wymiary skalowania:
🔸 Skalowanie poziome: powielanie podobnych rzeczy, klonowanie danych, dodawanie większej liczby replik.
🔸 Dezintegracja funkcjonalna: podział aplikacji na wiele usług.
🔸 #Sharding: podział danych na podzbiory według regionu, dzierżawcy, hasha lub przedziału liczbowego.

Punktem wyjścia jest zawsze monolit.
Punktem docelowym jest niemal nieskończona skalowalność przy jednoczesnym wdrożeniu wszystkich 3 wymiarów (patrz diagram w poście).

To wszystko. Jest to bardzo proste i skuteczne. Jeśli chcesz coś skalować, masz tylko 3 strategie, by to zrobić 😎. Nie musisz tracić czasu na odkrywanie koła na nowo.
📊
https://t.me/ProgramowanieLinux/2073
#architecture #engineering #scalability

39 viewsedited 09:40

About

Blog

Apps

Platform