GenAi, Deep Learning and Computer Vision
3.29K subscribers
31 photos
5 videos
5 files
168 links
Deep Learning๐Ÿ’ก,
Computer Vision ๐Ÿ“ฝ &
#Ai ๐Ÿง 

Get #free_books,
#Online_courses,
#Research_papers,
#Codes, and #Projects,
Tricks and Hacks, coding, training Stuff

Suggestion @AIindian
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
Drowsiness Detection
A simple Drowsiness Detection module for humans. Code : https://github.com/Niraj-Lunavat/Drowsiness-Detection
โค4
๐•๐ž๐ก๐ข๐œ๐ฅ๐ž ๐ƒ๐ž๐ญ๐ž๐œ๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐‚๐จ๐ฎ๐ง๐ญ๐ข๐ง๐  ๐Ÿš— ๐Ÿšš

Just done with an exciting project Vehicle detection and counting system that utilizes YOLOv8 and ByteTrack.

Unleashing the Power of YOLOv8 for Custom Object Detection and Tracking. The system functions by analyzing live video feeds from strategically positioned cameras along the highway. As vehicles pass by, the system can accurately detect and track them, keeping a real-time record of their entry and exit. This data can provide insights into the traffic flow, facilitating better decision-making for highway management and planning.

"Visit my GitHub repo to explore this exciting project, and feel free to contribute!"

Github:https://github.com/Niraj-Lunavat/Vehicle-Count
โค2
After spending almost month with new hype of GenAI (text, LLM not image/video) these are my observations.
Not in particular order and these are 'MY' observations on 'MY' tasks. Your conclusions will differ.

1. We need minimum 7B parameter models. Less than that performance of natural language understanding goes down drastically. More than this you need >24GB gpu.
2. Benchmarks are tricky ... some LLMs are good with some tasks while bad in others. Try to find model which works in your case the best. MPT-7B is still best for my usecases .. even better than Falcon-7B.
3. Prompts change with almost each model. You have to rework many times (There are some solutions around it .. trying to see if they work)
4. For finetuning you need at-least 1 gpu with >24 Gb vram .. 32 or 40 GB one good enough.
5. Finetuning just last few layers to speed up training/finetuning of LLM might not work out well (I tried!)
6. 8-bit, 4-bit model loading for VRAM saving works. For 7B model instead of 16gb, it takes ~10gb and <6gb respectively. BUT .. inference speed goes down drastically. (At-least I faced this issue). Performance also goes down in text understanding tasks.
7. Those like me who are trying to figure out LLM applications for your companies .. be aware for Licensing part. One model trained with other as reference and in case of llama you need original weights ... not a good idea to work in commerical setting.
8. There are 3 types of major LLMs types - basic(like gpt2/3), chat enabled, instruction enabled. Most of the time basic is not usable as it is .. unless you finetune it. Chat versions are the best versions. But most of the time they are not open-source.
9. Not everything needs to be solved with LLMs. Just do not force-fit any solution around LLM .. I have seen the same happening with Deep reinforcement learning some years back. Check this out -> https://lnkd.in/d2mxqhH9
10. I tried out but did not use langchains & vector-dbs. Never needed to ... simple python, embddings and efficient dot product worked for me.
11. LLMs need not have whole world knowledge .. we humans also do not have complete knowledge and still we survive bcz of adaptibility. They just need to know how to use knowledge. I think we can go super smaller in model size if we separate knowledge part somehow.
12. Simulating "thoughts" before answering and NOT just predicting one word after another might be the next wave of innovation.
โค3๐Ÿ‘3
Text -> Video just got real.

And it is all yours for taking!

The most powerful video generation model is now an open-source model, try it here: https://huggingface.co/cerspense/zeroscope_v2_XLโ€ฆ
โค2
This media is not supported in your browser
VIEW IN TELEGRAM
The source code for DragGAN has been released! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

We can finally play with that marvel!

โฎ‘ ๐Ÿ”— GitHub repository: https://github.com/XingangPan/DragGAN
โค6๐Ÿ‘1
How you can train Large Language Models?

Large language models (LLMs) are gaining significant popularity due to their versatility in text generation, translation, and question-answering tasks. However, training these models can be resource-intensive and time-consuming. LLMs examples include ๐†๐๐“-3 and ๐†๐๐“-4 from ๐Ž๐ฉ๐ž๐ง๐€๐ˆ, ๐‹๐‹๐š๐Œ๐€ from ๐Œ๐ž๐ญ๐š, ๐š๐ง๐ ๐๐š๐‹๐Œ2 from ๐†๐จ๐จ๐ ๐ฅ๐ž.

Several LLM training frameworks have emerged to address this challenge, offering solutions to streamline and enhance the training process. Here are some of the most popular frameworks that help you to train and tuning LLMs Models:

โœ… Deepspeed: An efficient deep learning optimization library that simplifies distributed training and inference, enabling easy and effective implementation.
Examples: https://www.deepspeed.ai/

โœ… Megatron-DeepSpeed: A DeepSpeed version of NVIDIA's Megatron-LM, offering additional support for MoE model training, Curriculum Learning, 3D Parallelism, and other advanced features.
Examples: https://huggingface.co/blog/bloom-megatron-deepspeed

โœ… FairScale: A PyTorch extension library designed for high-performance and large-scale training, empowering researchers and practitioners to train models more efficiently.
Example: https://fairscale.readthedocs.io/en/latest/tutorials/oss.html

โœ… Megatron-LM: A research-focused framework dedicated to training transformer models at scale, facilitating ongoing exploration in the field.
Examples:https://huggingface.co/blog/megatron-training

โœ… Colossal-AI: A platform that aims to make large AI models more accessible, faster, and cost-effective, contributing to democratizing AI advancements.
Examples: https://github.com/hpcaitech/ColossalAI/tree/main/examples

โœ… BMTrain: An efficient training framework tailored for big models, enabling smoother and more effective training processes.
Examples: https://github.com/OpenBMB/BMTrain

โœ… Mesh TensorFlow: A framework simplifying model parallelism, making it easier to leverage distributed computing resources for training large models.
Examples: https://github.com/tensorflow/mesh

โœ… Max text: A performant and scalable Jax LLM framework designed to simplify the training process while maintaining high performance.
Examples: https://github.com/EleutherAI/maxtext

โœ… Alpa: A system specifically developed for training and serving large-scale neural networks, offering comprehensive support for training requirements.
Examples: https://alpa.ai/opt

โœ… GPT-NeoX: An implementation of model parallel autoregressive transformers on GPUs, built on the DeepSpeed library, providing enhanced training capabilities.
Examples: https://blog.eleuther.ai/announcing-20b/

If you're interested in training LLMs, I encourage you to explore these frameworks. They can significantly simplify and optimize the training process, allowing you to achieve better results efficiently.
๐Ÿ‘5โค3
Kaiming He, inventor of ResNet, is leaving industry to join MIT faculty in 2024!! Heโ€™s one of the most impactful figures in deep learning.

- Residual layer is a fundamental building block of LLMs.
- Faster/Mask R-CNN are industrial standards for image segmentation and robot perception stack.
- Panoptic segmentation redefined a research sub-field in vision.
- Mask AutoEncoder (MAE) is among the best general-purpose self-supervised algorithms for computer vision and beyond.
- Before MAE, Momentum Contrast (MoCo) was a SOTA contrastive learning technique.
- SlowFast network was among the default backbones for video learning until ViTs took over.
- Too many other groundbreaking works to enumerate โ€ฆ

I recently observe an exodus of researchers from big techs to academia. Itโ€™s an interesting movement given the current LLM gold rush ๐Ÿค”
๐Ÿ”ฅ4๐Ÿ‘3
I work on a lot of NLP projects and it's starting to feel like I do more prompting than actual coding.

LLMs might completely change the way we code as they become more integrated into our tools and workflows. Still doing a lot of stitching with tools like ChatGPT and Copilot but I expect more seamlessness as these tools provide more functionalities like function calling.
๐Ÿ”ฅ2โค1
H2O LLM Studio

A framework and no-code GUI for fine-tuning LLMs and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs).

Documentation: https://h2oai.github.io/h2o-llmstudio/
https://github.com/h2oai/h2o-llmstudio
โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
The source code for NVIDIA's BundleSDF library has been released.

BundleSDF is a neural network based 6-DoF tracking and 3D reconstruction library for unknown objects. We would love to see this running on a ROS robot with MoveIt!

Source: https://github.com/NVlabs/BundleSDF
Paper: https://bundlesdf.github.io/
๐Ÿ”ฅ4โค1
Stanford University has just opened full access to CS224U. One of their immensely popular graduate-level Natural Language Understanding course taught by Professor Christopher Potts.

Checkout GitHub code & YouTube Playlist.
โค5
Introducing IDEFICS, the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of images and texts and produces text. A bit like a multimodal ChatGPT!

Blogpost: huggingface.co/blog/idefics
Playground: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground
๐Ÿ”ฅ6โค1
Forwarded from Artificial Intelligence
Dear friends,

Iโ€™d like to share a part of the origin story of large language models that isnโ€™t widely known.

A lot of early work in natural language processing (NLP) was funded by U.S. military intelligence agencies that needed machine translation and speech recognition capabilities. Then, as now, such agencies analyzed large volumes of text and recorded speech in various languages. They poured money into research in machine translation and speech recognition over decades, which motivated researchers to give these applications disproportionate attention relative to other uses of NLP.

This explains why many important technical breakthroughs in NLP stem from studying translation โ€” more than you might imagine based on the modest role that translation plays in current applications. For instance, the celebrated transformer paper, โ€œAttention is All You Needโ€ by the Google Brain team, introduced a technique for mapping a sentence in one language to a translation in another. This laid the foundation for large language models (LLMs) like ChatGPT, which map a prompt to a generated response.

Or consider the BLEU score, which is occasionally still used to evaluate LLMs by comparing their outputs to ground-truth examples. It was developed in 2002 to measure how well a machine-generated translation compares to a ground truth, human-created translation.

A key component of LLMs is tokenization, the process of breaking raw input text into sub-word components that become the tokens to be processed. For example, the first part of the previous sentence may be divided into tokens like this:

/A /key /component /of /LL/Ms/ is/ token/ization

The most widely used tokenization algorithm for text today is Byte Pair Encoding (BPE), which gained popularity in NLP after a 2015 paper by Sennrich et al. BPE starts with individual characters as tokens and repeatedly merges tokens that occur together frequently. Eventually, entire words as well as common sub-words become tokens. How did this technique come about? The authors wanted to build a model that could translate words that werenโ€™t represented in the training data. They found that splitting words into sub-words created an input representation that enabled the model, if it had seen โ€œtokenโ€ and โ€œization,โ€ to guess the meaning of a word it might not have seen before, such as โ€œtokenization.โ€

I donโ€™t intend this description of NLP history as advocacy for military-funded research. (I have accepted military funding, too. Some of my early work in deep learning at Stanford University was funded by DARPA, a U.S. defense research agency. This led directly to my starting Google Brain.) War is a horribly ugly business, and I would like there to be much less of it. Still, I find it striking that basic research in one area can lead to broadly beneficial developments in others. In similar ways, research into space travel led to LED lights and solar panels, experiments in particle physics led to magnetic resonance imaging, and studies of bacteriaโ€™s defenses against viruses led to the CRISPR gene-editing technology.

So itโ€™s especially exciting to see so much basic research going on in so many different areas of AI. Who knows, a few years hence, what todayโ€™s experiments will yield?

Keep learning!
Andrew NG
๐Ÿ‘2โค1