The Last 20 Python Packages You Will Ever Need
20 Python Packages you should know for all your Data Science, Data Engineering, and Machine Learning projects.
https://pub.towardsai.net/the-last-20-python-packages-you-will-ever-need
@DevMisc
#python #machilenearning #data
20 Python Packages you should know for all your Data Science, Data Engineering, and Machine Learning projects.
https://pub.towardsai.net/the-last-20-python-packages-you-will-ever-need
@DevMisc
#python #machilenearning #data
Algebraic data types: things I wish someone had explained about FP
Algebraic data types and algebraic data structures sound similar. It’s like they ought to be the same thing. But they’re not.
https://jrsinclair.com/articles/2019/algebraic/
@DevMisc
#fp #data #learn
Algebraic data types and algebraic data structures sound similar. It’s like they ought to be the same thing. But they’re not.
https://jrsinclair.com/articles/2019/algebraic/
@DevMisc
#fp #data #learn
Why German Strings are Everywhere
https://cedardb.com/blog/german_strings/
@DevMisc
#cpp #data #misc
- Developed by Umbra (CedarDB's predecessor)
- Adopted by DuckDB, Apache Arrow, Polars, and Facebook Velox
German Strings are a custom string type highly optimized for data processing. They offer significant improvements over traditional C and C++ string implementations.
Key Features:
- 128-bit struct representation (vs. 192 bits in C++)
- Short string optimization for strings ≤12 characters
- Long string format with 4-char prefix for quick comparisons
- Immutable design for better performance and concurrency
- Storage classes: persistent, transient, temporary
Advantages:
- Space-efficient, fitting in two CPU registers
- Reduced allocations and data movement
- Easier parallelization due to immutability
- Flexible lifetime management with storage classes
- Optimized for common database operations (comparisons, sorting)
Trade-offs:
- Requires careful consideration of string usage and lifetime
- Updates are more expensive (but rare in database systems)
- Maximum string length limited to 4 GiB
https://cedardb.com/blog/german_strings/
@DevMisc
#cpp #data #misc
👍5