GenAi, Deep Learning and Computer Vision

1.66K views02:25

This media is not supported in your browser

Drowsiness Detection
A simple Drowsiness Detection module for humans. Code : https://github.com/Niraj-Lunavat/Drowsiness-Detection

❤4

33.2K viewsedited 09:09

GenAi, Deep Learning and Computer Vision

https://keras.io/guides/keras_cv/object_detection_keras_cv/

👍3

1.49K views14:07

GenAi, Deep Learning and Computer Vision

𝐕𝐞𝐡𝐢𝐜𝐥𝐞 𝐃𝐞𝐭𝐞𝐜𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐂𝐨𝐮𝐧𝐭𝐢𝐧𝐠 🚗 🚚

Just done with an exciting project Vehicle detection and counting system that utilizes YOLOv8 and ByteTrack.

Unleashing the Power of YOLOv8 for Custom Object Detection and Tracking. The system functions by analyzing live video feeds from strategically positioned cameras along the highway. As vehicles pass by, the system can accurately detect and track them, keeping a real-time record of their entry and exit. This data can provide insights into the traffic flow, facilitating better decision-making for highway management and planning.

"Visit my GitHub repo to explore this exciting project, and feel free to contribute!"

Github:https://github.com/Niraj-Lunavat/Vehicle-Count

GitHub

GitHub - Niraj-Lunavat/Vehicle-Count

Contribute to Niraj-Lunavat/Vehicle-Count development by creating an account on GitHub.

❤2

1.73K viewsedited 04:11

GenAi, Deep Learning and Computer Vision

Transformers summary
https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/mobilepresent#slide=id.g31364026ad_3_2

Video version:
https://youtu.be/EixI6t5oif0

❤2

1.45K viewsedited 18:09

GenAi, Deep Learning and Computer Vision

After spending almost month with new hype of GenAI (text, LLM not image/video) these are my observations.
Not in particular order and these are 'MY' observations on 'MY' tasks. Your conclusions will differ.

1. We need minimum 7B parameter models. Less than that performance of natural language understanding goes down drastically. More than this you need >24GB gpu.
2. Benchmarks are tricky ... some LLMs are good with some tasks while bad in others. Try to find model which works in your case the best. MPT-7B is still best for my usecases .. even better than Falcon-7B.
3. Prompts change with almost each model. You have to rework many times (There are some solutions around it .. trying to see if they work)
4. For finetuning you need at-least 1 gpu with >24 Gb vram .. 32 or 40 GB one good enough.
5. Finetuning just last few layers to speed up training/finetuning of LLM might not work out well (I tried!)
6. 8-bit, 4-bit model loading for VRAM saving works. For 7B model instead of 16gb, it takes ~10gb and <6gb respectively. BUT .. inference speed goes down drastically. (At-least I faced this issue). Performance also goes down in text understanding tasks.
7. Those like me who are trying to figure out LLM applications for your companies .. be aware for Licensing part. One model trained with other as reference and in case of llama you need original weights ... not a good idea to work in commerical setting.
8. There are 3 types of major LLMs types - basic(like gpt2/3), chat enabled, instruction enabled. Most of the time basic is not usable as it is .. unless you finetune it. Chat versions are the best versions. But most of the time they are not open-source.
9. Not everything needs to be solved with LLMs. Just do not force-fit any solution around LLM .. I have seen the same happening with Deep reinforcement learning some years back. Check this out -> https://lnkd.in/d2mxqhH9
10. I tried out but did not use langchains & vector-dbs. Never needed to ... simple python, embddings and efficient dot product worked for me.
11. LLMs need not have whole world knowledge .. we humans also do not have complete knowledge and still we survive bcz of adaptibility. They just need to know how to use knowledge. I think we can go super smaller in model size if we separate knowledge part somehow.
12. Simulating "thoughts" before answering and NOT just predicting one word after another might be the next wave of innovation.

lnkd.in

This link will take you to a page that’s not on LinkedIn

❤3👍3

1.83K views05:52

GenAi, Deep Learning and Computer Vision

Text -> Video just got real.

And it is all yours for taking!

The most powerful video generation model is now an open-source model, try it here: https://huggingface.co/cerspense/zeroscope_v2_XL…

huggingface.co

cerspense/zeroscope_v2_XL · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

❤2

1.88K viewsedited 02:01

GenAi, Deep Learning and Computer Vision

0:12

This media is not supported in your browser

VIEW IN TELEGRAM

The source code for DragGAN has been released! 🔥🔥🔥

We can finally play with that marvel!

⮑ 🔗 GitHub repository: https://github.com/XingangPan/DragGAN

❤6👍1

7K views13:02

GenAi, Deep Learning and Computer Vision

How you can train Large Language Models?

Large language models (LLMs) are gaining significant popularity due to their versatility in text generation, translation, and question-answering tasks. However, training these models can be resource-intensive and time-consuming. LLMs examples include 𝐆𝐏𝐓-3 and 𝐆𝐏𝐓-4 from 𝐎𝐩𝐞𝐧𝐀𝐈, 𝐋𝐋𝐚𝐌𝐀 from 𝐌𝐞𝐭𝐚, 𝐚𝐧𝐝 𝐏𝐚𝐋𝐌2 from 𝐆𝐨𝐨𝐠𝐥𝐞.

Several LLM training frameworks have emerged to address this challenge, offering solutions to streamline and enhance the training process. Here are some of the most popular frameworks that help you to train and tuning LLMs Models:

✅ Deepspeed: An efficient deep learning optimization library that simplifies distributed training and inference, enabling easy and effective implementation.
Examples: https://www.deepspeed.ai/

✅ Megatron-DeepSpeed: A DeepSpeed version of NVIDIA's Megatron-LM, offering additional support for MoE model training, Curriculum Learning, 3D Parallelism, and other advanced features.
Examples: https://huggingface.co/blog/bloom-megatron-deepspeed

✅ FairScale: A PyTorch extension library designed for high-performance and large-scale training, empowering researchers and practitioners to train models more efficiently.
Example: https://fairscale.readthedocs.io/en/latest/tutorials/oss.html

✅ Megatron-LM: A research-focused framework dedicated to training transformer models at scale, facilitating ongoing exploration in the field.
Examples:https://huggingface.co/blog/megatron-training

✅ Colossal-AI: A platform that aims to make large AI models more accessible, faster, and cost-effective, contributing to democratizing AI advancements.
Examples: https://github.com/hpcaitech/ColossalAI/tree/main/examples

✅ BMTrain: An efficient training framework tailored for big models, enabling smoother and more effective training processes.
Examples: https://github.com/OpenBMB/BMTrain

✅ Mesh TensorFlow: A framework simplifying model parallelism, making it easier to leverage distributed computing resources for training large models.
Examples: https://github.com/tensorflow/mesh

✅ Max text: A performant and scalable Jax LLM framework designed to simplify the training process while maintaining high performance.
Examples: https://github.com/EleutherAI/maxtext

✅ Alpa: A system specifically developed for training and serving large-scale neural networks, offering comprehensive support for training requirements.
Examples: https://alpa.ai/opt

✅ GPT-NeoX: An implementation of model parallel autoregressive transformers on GPUs, built on the DeepSpeed library, providing enhanced training capabilities.
Examples: https://blog.eleuther.ai/announcing-20b/

If you're interested in training LLMs, I encourage you to explore these frameworks. They can significantly simplify and optimize the training process, allowing you to achieve better results efficiently.

DeepSpeed

Latest News

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

👍5❤3

2.08K viewsArtificial Intelligence, 13:30

GenAi, Deep Learning and Computer Vision

Kaiming He, inventor of ResNet, is leaving industry to join MIT faculty in 2024!! He’s one of the most impactful figures in deep learning.

- Residual layer is a fundamental building block of LLMs.
- Faster/Mask R-CNN are industrial standards for image segmentation and robot perception stack.
- Panoptic segmentation redefined a research sub-field in vision.
- Mask AutoEncoder (MAE) is among the best general-purpose self-supervised algorithms for computer vision and beyond.
- Before MAE, Momentum Contrast (MoCo) was a SOTA contrastive learning technique.
- SlowFast network was among the default backbones for video learning until ViTs took over.
- Too many other groundbreaking works to enumerate …

I recently observe an exodus of researchers from big techs to academia. It’s an interesting movement given the current LLM gold rush 🤔

🔥4👍3

7.44K viewsArtificial Intelligence, edited 03:03

GenAi, Deep Learning and Computer Vision

I work on a lot of NLP projects and it's starting to feel like I do more prompting than actual coding.

LLMs might completely change the way we code as they become more integrated into our tools and workflows. Still doing a lot of stitching with tools like ChatGPT and Copilot but I expect more seamlessness as these tools provide more functionalities like function calling.

🔥2❤1

2K viewsArtificial Intelligence, 14:21

GenAi, Deep Learning and Computer Vision

H2O LLM Studio

A framework and no-code GUI for fine-tuning LLMs and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs).

Documentation: https://h2oai.github.io/h2o-llmstudio/
https://github.com/h2oai/h2o-llmstudio

h2oai.github.io

H2O LLM Studio | Docs | H2O LLM Studio | Docs

A framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs)

❤1👍1

1.95K viewsArtificial Intelligence, 05:52

GenAi, Deep Learning and Computer Vision

0:53

This media is not supported in your browser

VIEW IN TELEGRAM

The source code for NVIDIA's BundleSDF library has been released.

BundleSDF is a neural network based 6-DoF tracking and 3D reconstruction library for unknown objects. We would love to see this running on a ROS robot with MoveIt!

Source: https://github.com/NVlabs/BundleSDF
Paper: https://bundlesdf.github.io/

🔥4❤1

1.98K viewsArtificial Intelligence, 03:45

GenAi, Deep Learning and Computer Vision

Stanford University has just opened full access to CS224U. One of their immensely popular graduate-level Natural Language Understanding course taught by Professor Christopher Potts.

Checkout GitHub code & YouTube Playlist.

❤5

2.23K viewsArtificial Intelligence, edited 09:35

GenAi, Deep Learning and Computer Vision

Introducing IDEFICS, the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of images and texts and produces text. A bit like a multimodal ChatGPT!

Blogpost: huggingface.co/blog/idefics
Playground: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground

🔥6❤1

3.4K viewsArtificial Intelligence, 10:59

GenAi, Deep Learning and Computer Vision

CS 25 has a great roadmap of LLM papers! 🙏
https://web.stanford.edu/class/cs25/

CS25

CS25: Transformers United V5

CS25 has become one of Stanford's hottest and most seminar courses, featuring top researchers at the forefront of Transformers research such as Geoffrey Hinton, Ashish Vaswani, and Andrej Karpathy. Our class has an incredibly popular reception within and…

❤3

2.5K viewsArtificial Intelligence, edited 15:14

GenAi, Deep Learning and Computer Vision

Forwarded from Artificial Intelligence

Dear friends,

I’d like to share a part of the origin story of large language models that isn’t widely known.

A lot of early work in natural language processing (NLP) was funded by U.S. military intelligence agencies that needed machine translation and speech recognition capabilities. Then, as now, such agencies analyzed large volumes of text and recorded speech in various languages. They poured money into research in machine translation and speech recognition over decades, which motivated researchers to give these applications disproportionate attention relative to other uses of NLP.

This explains why many important technical breakthroughs in NLP stem from studying translation — more than you might imagine based on the modest role that translation plays in current applications. For instance, the celebrated transformer paper, “Attention is All You Need” by the Google Brain team, introduced a technique for mapping a sentence in one language to a translation in another. This laid the foundation for large language models (LLMs) like ChatGPT, which map a prompt to a generated response.

Or consider the BLEU score, which is occasionally still used to evaluate LLMs by comparing their outputs to ground-truth examples. It was developed in 2002 to measure how well a machine-generated translation compares to a ground truth, human-created translation.

A key component of LLMs is tokenization, the process of breaking raw input text into sub-word components that become the tokens to be processed. For example, the first part of the previous sentence may be divided into tokens like this:

/A /key /component /of /LL/Ms/ is/ token/ization

The most widely used tokenization algorithm for text today is Byte Pair Encoding (BPE), which gained popularity in NLP after a 2015 paper by Sennrich et al. BPE starts with individual characters as tokens and repeatedly merges tokens that occur together frequently. Eventually, entire words as well as common sub-words become tokens. How did this technique come about? The authors wanted to build a model that could translate words that weren’t represented in the training data. They found that splitting words into sub-words created an input representation that enabled the model, if it had seen “token” and “ization,” to guess the meaning of a word it might not have seen before, such as “tokenization.”

I don’t intend this description of NLP history as advocacy for military-funded research. (I have accepted military funding, too. Some of my early work in deep learning at Stanford University was funded by DARPA, a U.S. defense research agency. This led directly to my starting Google Brain.) War is a horribly ugly business, and I would like there to be much less of it. Still, I find it striking that basic research in one area can lead to broadly beneficial developments in others. In similar ways, research into space travel led to LED lights and solar panels, experiments in particle physics led to magnetic resonance imaging, and studies of bacteria’s defenses against viruses led to the CRISPR gene-editing technology.

So it’s especially exciting to see so much basic research going on in so many different areas of AI. Who knows, a few years hence, what today’s experiments will yield?

Keep learning!
Andrew NG

👍2❤1

3.35K viewsArtificial Intelligence, 15:33

About

Blog

Apps

Platform