PythonHub
2.36K subscribers
2.35K photos
49K links
News & links about Python programming.
https://pythonhub.dev/
Download Telegram
Curating Custom Datasets for LLM Training with NVIDIA NeMo Curator

NeMo Curator, which is part of NVIDIA NeMo, offers workflows to download and curate data from various public sources out of the box such as Common Crawl, Wikipedia, and arXiv. It also provides flexibility for developers to customize data curation pipelines to address their unique requirements and create custom datasets. This post walks you through creating a custom data curation pipeline...

https://developer.nvidia.com/blog/curating-custom-datasets-for-llm-training-with-nvidia-nemo-curator/
Signals, shells, and docker: an onion of footguns

The post discusses potential security risks and "footguns" that can arise when using Docker containers, signals, and shells together in a development environment. It highlights how certain combinations of these tools can inadvertently grant excessive permissions or expose sensitive data, emphasizing the need for caution and proper configuration.

https://benchling.engineering/signals-shells-and-docker-an-onion-of-footguns-ee592e2b587b
Working with Excel Files in Python

https://www.python-excel.org/
How good is GPT-4o at generating Flask apps? Surprisingly promising

This article summarizes the findings when asking GPT-4o to generate Flask applications, ranging from a simple "Hello, World!" app to a full-fledged CRUD app with three database models and HTML pages with Tailwind. With careful prompting, GPT-4o can produce working Flask applications and follow (some) best coding practices.

https://ploomber.io/blog/gpt-4o-flask/
Python notebooks for fundamentals of music processing

https://www.audiolabs-erlangen.de/resources/MIR/FMP/C0/C0.html
mistral-finetune

mistral-finetune is a light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models.

https://github.com/mistralai/mistral-finetune
👍1
Fire Up Your Logging Needs with Pydantic Logfire

The pydantic team recently introduced logfire, a new logging tool that makes it easy to track and analyze your logs. Simply integrate logfire into your projects with just a few lines of code.

https://kadermiyanyedi.medium.com/fire-up-your-logging-needs-with-logfire-6330d7a08dfe
🔥1
DataFrames at Scale Comparison: TPC-H

We run benchmarks derived from the TPC-H benchmark suite on a variety of scales, hardware architectures, and dataframe projects, notably Apache Spark, Dask, DuckDB, and Polars. No project wins. This post analyzes results within each project and between projects.

https://docs.coiled.io/blog/tpch.html
How AI Can Help Deaf People Hear

This project facilitates communication between Deaf individuals and hearing individuals who do not understand American Sign Language (ASL). It is designed to respect and preserve ASL as the primary language.

https://www.youtube.com/watch?v=uuPxMWQRoXc