Best practices for running production workloads using Amazon MSK tiered storage
https://aws.amazon.com/blogs/big-data/best-practices-for-running-production-workloads-using-amazon-msk-tiered-storage/
#MSK
https://aws.amazon.com/blogs/big-data/best-practices-for-running-production-workloads-using-amazon-msk-tiered-storage/
#MSK
Amazon
Best practices for running production workloads using Amazon MSK tiered storage | Amazon Web Services
In the second post of the series, we discussed some core concepts of the Amazon Managed Streaming for Apache Kafka (Amazon MSK) tiered storage feature and explained how read and write operations work in a tiered storage enabled cluster. This post focuses…
RLHF & DPO: Simplifying and Enhancing Fine-Tuning for Language Models
https://www.linkedin.com/pulse/rlhf-dpo-simplifying-enhancing-fine-tuning-language-models-kirouane/
https://www.linkedin.com/pulse/rlhf-dpo-simplifying-enhancing-fine-tuning-language-models-kirouane/
Linkedin
RLHF & DPO: Simplifying and Enhancing Fine-Tuning for Language Models
What Is RLHF? Reinforcement Learning from Human Feedback (RLHF) is a cutting-edge approach in the field of artificial intelligence that leverages human preferences and guidance to train and improve machine learning models. At its core, RLHF is a machine learning…
ZomboDB brings powerful text-search and analytics features to Postgres by using Elasticsearch as an index type. Its comprehensive query language and SQL functions enable new and creative ways to query your relational data.
ZomboDB is a 100% native Postgres extension written in Rust with PGRX. ZomboDB uses Postgres's Index Access Method API to directly manage and optimize ZomboDB's specialized indices. As a native Postgres index type, ZomboDB allows you to CREATE INDEX ... USING zombodb on your existing Postgres tables. At that point, ZomboDB takes over and fully manages the remote Elasticsearch index, guaranteeing transactionally-correct text-search query results.
https://github.com/zombodb/zombodb/
ZomboDB is a 100% native Postgres extension written in Rust with PGRX. ZomboDB uses Postgres's Index Access Method API to directly manage and optimize ZomboDB's specialized indices. As a native Postgres index type, ZomboDB allows you to CREATE INDEX ... USING zombodb on your existing Postgres tables. At that point, ZomboDB takes over and fully manages the remote Elasticsearch index, guaranteeing transactionally-correct text-search query results.
https://github.com/zombodb/zombodb/
GitHub
GitHub - zombodb/zombodb: Making Postgres and Elasticsearch work together like it's 2023
Making Postgres and Elasticsearch work together like it's 2023 - zombodb/zombodb
This project contains a series of tiny broken programs (and one nasty surprise). By fixing them, you'll learn how to read and write Zig code.
#zig
https://codeberg.org/ziglings/exercises
#zig
https://codeberg.org/ziglings/exercises
Codeberg.org
exercises
Learn the ⚡Zig programming language by fixing tiny broken programs.
KahaDB is a file based persistence database that is local to the message broker that is using it. It has been optimized for fast persistence. It is the the default storage mechanism since ActiveMQ Classic 5.4. KahaDB uses less file descriptors and provides faster recovery than its predecessor, the AMQ Message Store.
In order to facilitate rapid retrieval of messages from the data logs, a B-tree index is created, which contains pointers to the locations of all the messages embedded in the data log files. The complete B-tree index is stored on disk and part or all of the B-tree index is held in a cache in memory. Evidently, the B-tree index can work more efficiently, if the complete index fits into the cache.
https://github.com/apache/activemq/tree/main/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb
In order to facilitate rapid retrieval of messages from the data logs, a B-tree index is created, which contains pointers to the locations of all the messages embedded in the data log files. The complete B-tree index is stored on disk and part or all of the B-tree index is held in a cache in memory. Evidently, the B-tree index can work more efficiently, if the complete index fits into the cache.
https://github.com/apache/activemq/tree/main/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb
GitHub
activemq/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb at main · apache/activemq
Mirror of Apache ActiveMQ. Contribute to apache/activemq development by creating an account on GitHub.
Interesting article about Kafka latency issue in scale due to ext4 filesystem, the sciencific approach they followed is really beuatiful.
https://blog.allegro.tech/2024/03/kafka-performance-analysis.html
https://blog.allegro.tech/2024/03/kafka-performance-analysis.html
blog.allegro.tech
Unlocking Kafka’s Potential: Tackling Tail Latency with eBPF
At Allegro, we use Kafka as a backbone for asynchronous communication between microservices. With up to 300k messages published and 1M messages consumed every second, it is a key part of our infrastructure. A few months ago, in our main Kafka cluster, we…
Slim allows developers to inspect, optimize and debug their containers .
#docker #optimization
https://github.com/slimtoolkit/slim
#docker #optimization
https://github.com/slimtoolkit/slim
GitHub
GitHub - slimtoolkit/slim: Slim(toolkit): Don't change anything in your container image and minify it by up to 30x (and for compiled…
Slim(toolkit): Don't change anything in your container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source) - slimtoolkit/slim
Kafka Tierd storage feature, a novel approach for archiving kafka messages. IMO NOT READY FOR PRODUCTION BUT KEEP EYE ON IT
https://developers.redhat.com/articles/2024/03/13/kafka-tiered-storage-deep-dive
https://aws.amazon.com/blogs/big-data/deep-dive-on-amazon-msk-tiered-storage/
https://developers.redhat.com/articles/2024/03/13/kafka-tiered-storage-deep-dive
https://aws.amazon.com/blogs/big-data/deep-dive-on-amazon-msk-tiered-storage/
Red Hat Developer
Kafka tiered storage deep dive | Red Hat Developer
Tiered storage is a new early access feature available as of Apache Kafka 3.6.0 that allows you to scale compute and storage resources independently, provides better client isolation, and allows
How neural network works under the hood? explaining about vector embeddings and it's use cases.
https://www.datastax.com/guides/what-is-a-vector-embedding
https://www.datastax.com/guides/what-is-a-vector-embedding
DataStax
What are Vector Embeddings? Applications, Use Cases & More
Read this detailed guide to learn what vector embeddings are, how they are used in Generative AI, and how they can be stored and accessed in vector databases.
We've all heard about RocksDB, but there's also Speedb, which may not be as famous as RocksDB, but it appears to be the fastest key-value storage engine in the world. Redis recently announced that they own Speedb and have included it in Redis's core. You can find more information about it here:
https://www.speedb.io/
https://www.speedb.io/
www.speedb.io
Speedb | The Next Generation Key-Value Storage Engine
SPEEDB is revolutionizing the data management market, by turbocharging databases on the storage engine layer to offer: Dramatically improved write amplification, Low resources footprint, No W/O hangs on writes, Full support of ANY object size - big or small…
Forwarded from Jim Mim
"I wrote this Format dialog back on a rainy Thursday morning at Microsoft in late 1994, I think it was.
We were porting the bajillion lines of code from the Windows95 user interface over to NT, and Format was just one of those areas where WindowsNT was different enough from Windows95 that we had to come up with some custom UI.
I got out a piece of paper and wrote down all the options and choices you could make with respect to formatting a disk, like filesystem, label, cluster size, compression, encryption, and so on.
Then I busted out VC++2.0 and used the Resource Editor to lay out a simple vertical stack of all the choices you had to make, in the approximate order you had to make. It wasn't elegant, but it would do until the elegant UI arrived.
That was some 30 years ago, and the dialog is still my temporary one from that Thursday morning, so be careful about checking in "temporary" solutions!
I also had to decide how much "cluster slack" would be too much, and that wound up constraining the format size of a FAT volume to 32GB. That limit was also an arbitrary choice that morning, and one that has stuck with us as a permanent side effect.
So remember... there are no "temporary" checkins :)
Follow me for more random code musings!"
From Dave W Plummer, Developer of many famous Windows components such as Task Manager, Windows Pinball, Calc, ZIPFolders, Product Activation, etc.
https://twitter.com/davepl1968/status/1772042158046146792
We were porting the bajillion lines of code from the Windows95 user interface over to NT, and Format was just one of those areas where WindowsNT was different enough from Windows95 that we had to come up with some custom UI.
I got out a piece of paper and wrote down all the options and choices you could make with respect to formatting a disk, like filesystem, label, cluster size, compression, encryption, and so on.
Then I busted out VC++2.0 and used the Resource Editor to lay out a simple vertical stack of all the choices you had to make, in the approximate order you had to make. It wasn't elegant, but it would do until the elegant UI arrived.
That was some 30 years ago, and the dialog is still my temporary one from that Thursday morning, so be careful about checking in "temporary" solutions!
I also had to decide how much "cluster slack" would be too much, and that wound up constraining the format size of a FAT volume to 32GB. That limit was also an arbitrary choice that morning, and one that has stuck with us as a permanent side effect.
So remember... there are no "temporary" checkins :)
Follow me for more random code musings!"
From Dave W Plummer, Developer of many famous Windows components such as Task Manager, Windows Pinball, Calc, ZIPFolders, Product Activation, etc.
https://twitter.com/davepl1968/status/1772042158046146792
https://people.freebsd.org/~phk/
Poul-Henning Kamp (a.k.a phk) is one of the most well-known software engineers in the world, known for developing many parts of the FreeBSD kernel and Varnish Cache.
This is one of his notes titled 'Notes from the Architect' on the Varnish Cache website.
https://varnish-cache.org/docs/trunk/phk/notes.html
—
"Well, today computers really only have one kind of storage, and it is usually some sort of disk, the operating system and the virtual memory management hardware has converted the RAM to a cache for the disk storage.
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done."
—-
Multi-CPU systems is nothing new, but writing programs that use more than one CPU at a time has always been tricky and it still is.
—-
Poul-Henning Kamp (a.k.a phk) is one of the most well-known software engineers in the world, known for developing many parts of the FreeBSD kernel and Varnish Cache.
This is one of his notes titled 'Notes from the Architect' on the Varnish Cache website.
https://varnish-cache.org/docs/trunk/phk/notes.html
—
"Well, today computers really only have one kind of storage, and it is usually some sort of disk, the operating system and the virtual memory management hardware has converted the RAM to a cache for the disk storage.
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done."
—-
Multi-CPU systems is nothing new, but writing programs that use more than one CPU at a time has always been tricky and it still is.
—-
https://google.github.io/pytype/
https://github.com/facebook/pyre-check/
https://github.com/google/importlab
python static analysis code, especially type checking , and modules dependencies on each other
https://github.com/facebook/pyre-check/
https://github.com/google/importlab
python static analysis code, especially type checking , and modules dependencies on each other
pytype
pytype - 🦆✔
A static type analyzer for Python code
Do not forget these two rules:
Everything in software architecture is a trade off,
and always why is more important than how.
Everything in software architecture is a trade off,
and always why is more important than how.
Anderson's law:
I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated.
so when initially examining a problem, it's common to find that it becomes more complex as you go deeper into it. This complexity often requires a deeper level of investigation to determine the best solution.
#architecture
I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated.
so when initially examining a problem, it's common to find that it becomes more complex as you go deeper into it. This complexity often requires a deeper level of investigation to determine the best solution.
#architecture
human interaction could be a bottleneck in most software projects, and should be managed very carefully.
there is a law around it that says "Adding manpower to a late software project makes it later."
it called Brooks's law
#architecture
there is a law around it that says "Adding manpower to a late software project makes it later."
it called Brooks's law
#architecture
Whenever you find yourself implementing distributed transactions within a system, keep the meaning of saga word in your mind.
a long story of heroic achievement;
a story of quasi-legendary events; colloquially, a long tale.
- Oxford English Dictionary
a long story of heroic achievement;
a story of quasi-legendary events; colloquially, a long tale.
- Oxford English Dictionary
Using Postgres WAL trade-offs, rules you should know before making any decision in this area.
https://www.enterprisedb.com/blog/postgresql-wal-write-ahead-logging-management-strategy-tradeoffs
https://www.enterprisedb.com/blog/postgresql-wal-write-ahead-logging-management-strategy-tradeoffs
EDB
PostgreSQL Write-Ahead Logging (WAL) Trade-offs: Bounded vs. Archived vs. Replication Slots