PythonHub
2.42K subscribers
2.35K photos
49.1K links
News & links about Python programming.
https://pythonhub.dev/
Download Telegram
Understanding GPT tokenizers

Large language models such as GPT-3/4, LLaMA and PaLM work in terms of tokens. They take text, convert it into tokens (integers), then predict which tokens should come next. Playing around with these tokens is an interesting way to get a better idea for how this stuff actually works under the hood.

https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
AsyncIO

The author argues that asyncio is too complex, difficult to use, and that its performance benefits are not worth the added complexity for most applications. The author recommends using gevent, another Python library that allows asynchronous programming, as an alternative to asyncio.

https://charlesleifer.com/blog/asyncio/
Understanding CPUs can help speed up Numba and NumPy code

With a little understanding of how CPUs and compilers work, you can speed up NumPy with faster Numba code.

https://pythonspeed.com/articles/speeding-up-numba/
Audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

https://github.com/facebookresearch/audiocraft
Pro-Tip – pytest fixtures are magic!

Fixtures are building blocks for good tests and can increase development speed. The main issue with writing tests is setting up necessary data before the test, but pytest fixtures make it easier by injecting necessary data into your tests.

https://www.revsys.com/tidbits/pytest-fixtures-are-magic