This media is not supported in your browser
VIEW IN TELEGRAM
Drowsiness Detection
A simple Drowsiness Detection module for humans. Code : https://github.com/Niraj-Lunavat/Drowsiness-Detection
A simple Drowsiness Detection module for humans. Code : https://github.com/Niraj-Lunavat/Drowsiness-Detection
โค4
๐๐๐ก๐ข๐๐ฅ๐ ๐๐๐ญ๐๐๐ญ๐ข๐จ๐ง ๐๐ง๐ ๐๐จ๐ฎ๐ง๐ญ๐ข๐ง๐ ๐ ๐
Just done with an exciting project Vehicle detection and counting system that utilizes YOLOv8 and ByteTrack.
Unleashing the Power of YOLOv8 for Custom Object Detection and Tracking. The system functions by analyzing live video feeds from strategically positioned cameras along the highway. As vehicles pass by, the system can accurately detect and track them, keeping a real-time record of their entry and exit. This data can provide insights into the traffic flow, facilitating better decision-making for highway management and planning.
"Visit my GitHub repo to explore this exciting project, and feel free to contribute!"
Github:https://github.com/Niraj-Lunavat/Vehicle-Count
Just done with an exciting project Vehicle detection and counting system that utilizes YOLOv8 and ByteTrack.
Unleashing the Power of YOLOv8 for Custom Object Detection and Tracking. The system functions by analyzing live video feeds from strategically positioned cameras along the highway. As vehicles pass by, the system can accurately detect and track them, keeping a real-time record of their entry and exit. This data can provide insights into the traffic flow, facilitating better decision-making for highway management and planning.
"Visit my GitHub repo to explore this exciting project, and feel free to contribute!"
Github:https://github.com/Niraj-Lunavat/Vehicle-Count
GitHub
GitHub - Niraj-Lunavat/Vehicle-Count
Contribute to Niraj-Lunavat/Vehicle-Count development by creating an account on GitHub.
โค2
Transformers summary
https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/mobilepresent#slide=id.g31364026ad_3_2
Video version:
https://youtu.be/EixI6t5oif0
https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/mobilepresent#slide=id.g31364026ad_3_2
Video version:
https://youtu.be/EixI6t5oif0
โค2
After spending almost month with new hype of GenAI (text, LLM not image/video) these are my observations.
Not in particular order and these are 'MY' observations on 'MY' tasks. Your conclusions will differ.
1. We need minimum 7B parameter models. Less than that performance of natural language understanding goes down drastically. More than this you need >24GB gpu.
2. Benchmarks are tricky ... some LLMs are good with some tasks while bad in others. Try to find model which works in your case the best. MPT-7B is still best for my usecases .. even better than Falcon-7B.
3. Prompts change with almost each model. You have to rework many times (There are some solutions around it .. trying to see if they work)
4. For finetuning you need at-least 1 gpu with >24 Gb vram .. 32 or 40 GB one good enough.
5. Finetuning just last few layers to speed up training/finetuning of LLM might not work out well (I tried!)
6. 8-bit, 4-bit model loading for VRAM saving works. For 7B model instead of 16gb, it takes ~10gb and <6gb respectively. BUT .. inference speed goes down drastically. (At-least I faced this issue). Performance also goes down in text understanding tasks.
7. Those like me who are trying to figure out LLM applications for your companies .. be aware for Licensing part. One model trained with other as reference and in case of llama you need original weights ... not a good idea to work in commerical setting.
8. There are 3 types of major LLMs types - basic(like gpt2/3), chat enabled, instruction enabled. Most of the time basic is not usable as it is .. unless you finetune it. Chat versions are the best versions. But most of the time they are not open-source.
9. Not everything needs to be solved with LLMs. Just do not force-fit any solution around LLM .. I have seen the same happening with Deep reinforcement learning some years back. Check this out -> https://lnkd.in/d2mxqhH9
10. I tried out but did not use langchains & vector-dbs. Never needed to ... simple python, embddings and efficient dot product worked for me.
11. LLMs need not have whole world knowledge .. we humans also do not have complete knowledge and still we survive bcz of adaptibility. They just need to know how to use knowledge. I think we can go super smaller in model size if we separate knowledge part somehow.
12. Simulating "thoughts" before answering and NOT just predicting one word after another might be the next wave of innovation.
Not in particular order and these are 'MY' observations on 'MY' tasks. Your conclusions will differ.
1. We need minimum 7B parameter models. Less than that performance of natural language understanding goes down drastically. More than this you need >24GB gpu.
2. Benchmarks are tricky ... some LLMs are good with some tasks while bad in others. Try to find model which works in your case the best. MPT-7B is still best for my usecases .. even better than Falcon-7B.
3. Prompts change with almost each model. You have to rework many times (There are some solutions around it .. trying to see if they work)
4. For finetuning you need at-least 1 gpu with >24 Gb vram .. 32 or 40 GB one good enough.
5. Finetuning just last few layers to speed up training/finetuning of LLM might not work out well (I tried!)
6. 8-bit, 4-bit model loading for VRAM saving works. For 7B model instead of 16gb, it takes ~10gb and <6gb respectively. BUT .. inference speed goes down drastically. (At-least I faced this issue). Performance also goes down in text understanding tasks.
7. Those like me who are trying to figure out LLM applications for your companies .. be aware for Licensing part. One model trained with other as reference and in case of llama you need original weights ... not a good idea to work in commerical setting.
8. There are 3 types of major LLMs types - basic(like gpt2/3), chat enabled, instruction enabled. Most of the time basic is not usable as it is .. unless you finetune it. Chat versions are the best versions. But most of the time they are not open-source.
9. Not everything needs to be solved with LLMs. Just do not force-fit any solution around LLM .. I have seen the same happening with Deep reinforcement learning some years back. Check this out -> https://lnkd.in/d2mxqhH9
10. I tried out but did not use langchains & vector-dbs. Never needed to ... simple python, embddings and efficient dot product worked for me.
11. LLMs need not have whole world knowledge .. we humans also do not have complete knowledge and still we survive bcz of adaptibility. They just need to know how to use knowledge. I think we can go super smaller in model size if we separate knowledge part somehow.
12. Simulating "thoughts" before answering and NOT just predicting one word after another might be the next wave of innovation.
lnkd.in
LinkedIn
This link will take you to a page thatโs not on LinkedIn
โค3๐3
Text -> Video just got real.
And it is all yours for taking!
The most powerful video generation model is now an open-source model, try it here: https://huggingface.co/cerspense/zeroscope_v2_XLโฆ
And it is all yours for taking!
The most powerful video generation model is now an open-source model, try it here: https://huggingface.co/cerspense/zeroscope_v2_XLโฆ
huggingface.co
cerspense/zeroscope_v2_XL ยท Hugging Face
Weโre on a journey to advance and democratize artificial intelligence through open source and open science.
โค2
This media is not supported in your browser
VIEW IN TELEGRAM
The source code for DragGAN has been released! ๐ฅ๐ฅ๐ฅ
We can finally play with that marvel!
โฎ ๐ GitHub repository: https://github.com/XingangPan/DragGAN
We can finally play with that marvel!
โฎ ๐ GitHub repository: https://github.com/XingangPan/DragGAN
โค6๐1
How you can train Large Language Models?
Large language models (LLMs) are gaining significant popularity due to their versatility in text generation, translation, and question-answering tasks. However, training these models can be resource-intensive and time-consuming. LLMs examples include ๐๐๐-3 and ๐๐๐-4 from ๐๐ฉ๐๐ง๐๐, ๐๐๐๐๐ from ๐๐๐ญ๐, ๐๐ง๐ ๐๐๐๐2 from ๐๐จ๐จ๐ ๐ฅ๐.
Several LLM training frameworks have emerged to address this challenge, offering solutions to streamline and enhance the training process. Here are some of the most popular frameworks that help you to train and tuning LLMs Models:
โ Deepspeed: An efficient deep learning optimization library that simplifies distributed training and inference, enabling easy and effective implementation.
Examples: https://www.deepspeed.ai/
โ Megatron-DeepSpeed: A DeepSpeed version of NVIDIA's Megatron-LM, offering additional support for MoE model training, Curriculum Learning, 3D Parallelism, and other advanced features.
Examples: https://huggingface.co/blog/bloom-megatron-deepspeed
โ FairScale: A PyTorch extension library designed for high-performance and large-scale training, empowering researchers and practitioners to train models more efficiently.
Example: https://fairscale.readthedocs.io/en/latest/tutorials/oss.html
โ Megatron-LM: A research-focused framework dedicated to training transformer models at scale, facilitating ongoing exploration in the field.
Examples:https://huggingface.co/blog/megatron-training
โ Colossal-AI: A platform that aims to make large AI models more accessible, faster, and cost-effective, contributing to democratizing AI advancements.
Examples: https://github.com/hpcaitech/ColossalAI/tree/main/examples
โ BMTrain: An efficient training framework tailored for big models, enabling smoother and more effective training processes.
Examples: https://github.com/OpenBMB/BMTrain
โ Mesh TensorFlow: A framework simplifying model parallelism, making it easier to leverage distributed computing resources for training large models.
Examples: https://github.com/tensorflow/mesh
โ Max text: A performant and scalable Jax LLM framework designed to simplify the training process while maintaining high performance.
Examples: https://github.com/EleutherAI/maxtext
โ Alpa: A system specifically developed for training and serving large-scale neural networks, offering comprehensive support for training requirements.
Examples: https://alpa.ai/opt
โ GPT-NeoX: An implementation of model parallel autoregressive transformers on GPUs, built on the DeepSpeed library, providing enhanced training capabilities.
Examples: https://blog.eleuther.ai/announcing-20b/
If you're interested in training LLMs, I encourage you to explore these frameworks. They can significantly simplify and optimize the training process, allowing you to achieve better results efficiently.
Large language models (LLMs) are gaining significant popularity due to their versatility in text generation, translation, and question-answering tasks. However, training these models can be resource-intensive and time-consuming. LLMs examples include ๐๐๐-3 and ๐๐๐-4 from ๐๐ฉ๐๐ง๐๐, ๐๐๐๐๐ from ๐๐๐ญ๐, ๐๐ง๐ ๐๐๐๐2 from ๐๐จ๐จ๐ ๐ฅ๐.
Several LLM training frameworks have emerged to address this challenge, offering solutions to streamline and enhance the training process. Here are some of the most popular frameworks that help you to train and tuning LLMs Models:
โ Deepspeed: An efficient deep learning optimization library that simplifies distributed training and inference, enabling easy and effective implementation.
Examples: https://www.deepspeed.ai/
โ Megatron-DeepSpeed: A DeepSpeed version of NVIDIA's Megatron-LM, offering additional support for MoE model training, Curriculum Learning, 3D Parallelism, and other advanced features.
Examples: https://huggingface.co/blog/bloom-megatron-deepspeed
โ FairScale: A PyTorch extension library designed for high-performance and large-scale training, empowering researchers and practitioners to train models more efficiently.
Example: https://fairscale.readthedocs.io/en/latest/tutorials/oss.html
โ Megatron-LM: A research-focused framework dedicated to training transformer models at scale, facilitating ongoing exploration in the field.
Examples:https://huggingface.co/blog/megatron-training
โ Colossal-AI: A platform that aims to make large AI models more accessible, faster, and cost-effective, contributing to democratizing AI advancements.
Examples: https://github.com/hpcaitech/ColossalAI/tree/main/examples
โ BMTrain: An efficient training framework tailored for big models, enabling smoother and more effective training processes.
Examples: https://github.com/OpenBMB/BMTrain
โ Mesh TensorFlow: A framework simplifying model parallelism, making it easier to leverage distributed computing resources for training large models.
Examples: https://github.com/tensorflow/mesh
โ Max text: A performant and scalable Jax LLM framework designed to simplify the training process while maintaining high performance.
Examples: https://github.com/EleutherAI/maxtext
โ Alpa: A system specifically developed for training and serving large-scale neural networks, offering comprehensive support for training requirements.
Examples: https://alpa.ai/opt
โ GPT-NeoX: An implementation of model parallel autoregressive transformers on GPUs, built on the DeepSpeed library, providing enhanced training capabilities.
Examples: https://blog.eleuther.ai/announcing-20b/
If you're interested in training LLMs, I encourage you to explore these frameworks. They can significantly simplify and optimize the training process, allowing you to achieve better results efficiently.
DeepSpeed
Latest News
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
๐5โค3
Kaiming He, inventor of ResNet, is leaving industry to join MIT faculty in 2024!! Heโs one of the most impactful figures in deep learning.
- Residual layer is a fundamental building block of LLMs.
- Faster/Mask R-CNN are industrial standards for image segmentation and robot perception stack.
- Panoptic segmentation redefined a research sub-field in vision.
- Mask AutoEncoder (MAE) is among the best general-purpose self-supervised algorithms for computer vision and beyond.
- Before MAE, Momentum Contrast (MoCo) was a SOTA contrastive learning technique.
- SlowFast network was among the default backbones for video learning until ViTs took over.
- Too many other groundbreaking works to enumerate โฆ
I recently observe an exodus of researchers from big techs to academia. Itโs an interesting movement given the current LLM gold rush ๐ค
- Residual layer is a fundamental building block of LLMs.
- Faster/Mask R-CNN are industrial standards for image segmentation and robot perception stack.
- Panoptic segmentation redefined a research sub-field in vision.
- Mask AutoEncoder (MAE) is among the best general-purpose self-supervised algorithms for computer vision and beyond.
- Before MAE, Momentum Contrast (MoCo) was a SOTA contrastive learning technique.
- SlowFast network was among the default backbones for video learning until ViTs took over.
- Too many other groundbreaking works to enumerate โฆ
I recently observe an exodus of researchers from big techs to academia. Itโs an interesting movement given the current LLM gold rush ๐ค
๐ฅ4๐3
I work on a lot of NLP projects and it's starting to feel like I do more prompting than actual coding.
LLMs might completely change the way we code as they become more integrated into our tools and workflows. Still doing a lot of stitching with tools like ChatGPT and Copilot but I expect more seamlessness as these tools provide more functionalities like function calling.
LLMs might completely change the way we code as they become more integrated into our tools and workflows. Still doing a lot of stitching with tools like ChatGPT and Copilot but I expect more seamlessness as these tools provide more functionalities like function calling.
๐ฅ2โค1
H2O LLM Studio
A framework and no-code GUI for fine-tuning LLMs and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs).
Documentation: https://h2oai.github.io/h2o-llmstudio/
https://github.com/h2oai/h2o-llmstudio
A framework and no-code GUI for fine-tuning LLMs and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs).
Documentation: https://h2oai.github.io/h2o-llmstudio/
https://github.com/h2oai/h2o-llmstudio
h2oai.github.io
H2O LLM Studio | Docs | H2O LLM Studio | Docs
A framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs)
โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
The source code for NVIDIA's BundleSDF library has been released.
BundleSDF is a neural network based 6-DoF tracking and 3D reconstruction library for unknown objects. We would love to see this running on a ROS robot with MoveIt!
Source: https://github.com/NVlabs/BundleSDF
Paper: https://bundlesdf.github.io/
BundleSDF is a neural network based 6-DoF tracking and 3D reconstruction library for unknown objects. We would love to see this running on a ROS robot with MoveIt!
Source: https://github.com/NVlabs/BundleSDF
Paper: https://bundlesdf.github.io/
๐ฅ4โค1
Stanford University has just opened full access to CS224U. One of their immensely popular graduate-level Natural Language Understanding course taught by Professor Christopher Potts.
Checkout GitHub code & YouTube Playlist.
Checkout GitHub code & YouTube Playlist.
โค5
Introducing IDEFICS, the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of images and texts and produces text. A bit like a multimodal ChatGPT!
Blogpost: huggingface.co/blog/idefics
Playground: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground
The model accepts arbitrary sequences of images and texts and produces text. A bit like a multimodal ChatGPT!
Blogpost: huggingface.co/blog/idefics
Playground: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground
๐ฅ6โค1
CS 25 has a great roadmap of LLM papers! ๐
https://web.stanford.edu/class/cs25/
https://web.stanford.edu/class/cs25/
CS25
CS25: Transformers United V5
CS25 has become one of Stanford's hottest and most seminar courses, featuring top researchers at the forefront of Transformers research such as Geoffrey Hinton, Ashish Vaswani, and Andrej Karpathy. Our class has an incredibly popular reception within andโฆ
โค3
Forwarded from Artificial Intelligence
Dear friends,
Iโd like to share a part of the origin story of large language models that isnโt widely known.
A lot of early work in natural language processing (NLP) was funded by U.S. military intelligence agencies that needed machine translation and speech recognition capabilities. Then, as now, such agencies analyzed large volumes of text and recorded speech in various languages. They poured money into research in machine translation and speech recognition over decades, which motivated researchers to give these applications disproportionate attention relative to other uses of NLP.
This explains why many important technical breakthroughs in NLP stem from studying translation โ more than you might imagine based on the modest role that translation plays in current applications. For instance, the celebrated transformer paper, โAttention is All You Needโ by the Google Brain team, introduced a technique for mapping a sentence in one language to a translation in another. This laid the foundation for large language models (LLMs) like ChatGPT, which map a prompt to a generated response.
Or consider the BLEU score, which is occasionally still used to evaluate LLMs by comparing their outputs to ground-truth examples. It was developed in 2002 to measure how well a machine-generated translation compares to a ground truth, human-created translation.
A key component of LLMs is tokenization, the process of breaking raw input text into sub-word components that become the tokens to be processed. For example, the first part of the previous sentence may be divided into tokens like this:
/A /key /component /of /LL/Ms/ is/ token/ization
The most widely used tokenization algorithm for text today is Byte Pair Encoding (BPE), which gained popularity in NLP after a 2015 paper by Sennrich et al. BPE starts with individual characters as tokens and repeatedly merges tokens that occur together frequently. Eventually, entire words as well as common sub-words become tokens. How did this technique come about? The authors wanted to build a model that could translate words that werenโt represented in the training data. They found that splitting words into sub-words created an input representation that enabled the model, if it had seen โtokenโ and โization,โ to guess the meaning of a word it might not have seen before, such as โtokenization.โ
I donโt intend this description of NLP history as advocacy for military-funded research. (I have accepted military funding, too. Some of my early work in deep learning at Stanford University was funded by DARPA, a U.S. defense research agency. This led directly to my starting Google Brain.) War is a horribly ugly business, and I would like there to be much less of it. Still, I find it striking that basic research in one area can lead to broadly beneficial developments in others. In similar ways, research into space travel led to LED lights and solar panels, experiments in particle physics led to magnetic resonance imaging, and studies of bacteriaโs defenses against viruses led to the CRISPR gene-editing technology.
So itโs especially exciting to see so much basic research going on in so many different areas of AI. Who knows, a few years hence, what todayโs experiments will yield?
Keep learning!
Andrew NG
Iโd like to share a part of the origin story of large language models that isnโt widely known.
A lot of early work in natural language processing (NLP) was funded by U.S. military intelligence agencies that needed machine translation and speech recognition capabilities. Then, as now, such agencies analyzed large volumes of text and recorded speech in various languages. They poured money into research in machine translation and speech recognition over decades, which motivated researchers to give these applications disproportionate attention relative to other uses of NLP.
This explains why many important technical breakthroughs in NLP stem from studying translation โ more than you might imagine based on the modest role that translation plays in current applications. For instance, the celebrated transformer paper, โAttention is All You Needโ by the Google Brain team, introduced a technique for mapping a sentence in one language to a translation in another. This laid the foundation for large language models (LLMs) like ChatGPT, which map a prompt to a generated response.
Or consider the BLEU score, which is occasionally still used to evaluate LLMs by comparing their outputs to ground-truth examples. It was developed in 2002 to measure how well a machine-generated translation compares to a ground truth, human-created translation.
A key component of LLMs is tokenization, the process of breaking raw input text into sub-word components that become the tokens to be processed. For example, the first part of the previous sentence may be divided into tokens like this:
/A /key /component /of /LL/Ms/ is/ token/ization
The most widely used tokenization algorithm for text today is Byte Pair Encoding (BPE), which gained popularity in NLP after a 2015 paper by Sennrich et al. BPE starts with individual characters as tokens and repeatedly merges tokens that occur together frequently. Eventually, entire words as well as common sub-words become tokens. How did this technique come about? The authors wanted to build a model that could translate words that werenโt represented in the training data. They found that splitting words into sub-words created an input representation that enabled the model, if it had seen โtokenโ and โization,โ to guess the meaning of a word it might not have seen before, such as โtokenization.โ
I donโt intend this description of NLP history as advocacy for military-funded research. (I have accepted military funding, too. Some of my early work in deep learning at Stanford University was funded by DARPA, a U.S. defense research agency. This led directly to my starting Google Brain.) War is a horribly ugly business, and I would like there to be much less of it. Still, I find it striking that basic research in one area can lead to broadly beneficial developments in others. In similar ways, research into space travel led to LED lights and solar panels, experiments in particle physics led to magnetic resonance imaging, and studies of bacteriaโs defenses against viruses led to the CRISPR gene-editing technology.
So itโs especially exciting to see so much basic research going on in so many different areas of AI. Who knows, a few years hence, what todayโs experiments will yield?
Keep learning!
Andrew NG
๐2โค1