Connect with us on WhatsApp: https://whatsapp.com/channel/0029Va8iIT7KbYMOIWdNVu2Q
WhatsApp.com
Artificial Intelligence | WhatsApp Channel
Artificial Intelligence WhatsApp Channel. *AI will not replace you โ but a person using Ai will*
Welcome to the Ai community, where we make *Artificial Intelligence easy, accessible, and powerful for everyone!* Whether youโre a Beginner or an expert, thisโฆ
Welcome to the Ai community, where we make *Artificial Intelligence easy, accessible, and powerful for everyone!* Whether youโre a Beginner or an expert, thisโฆ
Low-Rank Adaptation of Large Language Models (LoRA) is a training method that accelerates the training of large models while consuming less memory.
๐ค LoRA is like a special trick that helps computers learn better and faster. Imagine a computer trying to learn new things, like recognizing pictures or understanding language. When it learns, it uses something called "weights," which are like little helpers inside the computer.
๐ Now, LoRA's trick is to make these little helpers work smarter. Instead of changing all the helpers every time the computer learns something new, LoRA only changes a few of them. It's like having a big group of friends, but only a couple of them have to do the hard work, and the others can rest.
๐กHere's how it works:
1. The computer has these helpers (weights) that it uses to learn.
2. LoRA makes two special groups of helpers that are smaller and easier to work with.
3. The computer trains these special groups of helpers to learn new things without changing all the original helpers.
4. After training, the computer combines the new helpers with the original ones, like mixing two colors to get a new color.
5. This makes the computer learn faster and doesn't use too much computer memory.
๐ The good things about LoRA are:
- It helps the computer learn without using too many helpers, so it's faster.
- The original helpers stay the same, so we can use them for different tasks.
- It can work with other tricks that make computers smarter.
- The computer works just as fast when using LoRA, so we don't have to wait.
So, LoRA is like a cool trick that helps computers learn better and faster without making them slow down. It's like having a superhero team of helpers inside the computer! โค๏ธ
๐ค LoRA is like a special trick that helps computers learn better and faster. Imagine a computer trying to learn new things, like recognizing pictures or understanding language. When it learns, it uses something called "weights," which are like little helpers inside the computer.
๐ Now, LoRA's trick is to make these little helpers work smarter. Instead of changing all the helpers every time the computer learns something new, LoRA only changes a few of them. It's like having a big group of friends, but only a couple of them have to do the hard work, and the others can rest.
๐กHere's how it works:
1. The computer has these helpers (weights) that it uses to learn.
2. LoRA makes two special groups of helpers that are smaller and easier to work with.
3. The computer trains these special groups of helpers to learn new things without changing all the original helpers.
4. After training, the computer combines the new helpers with the original ones, like mixing two colors to get a new color.
5. This makes the computer learn faster and doesn't use too much computer memory.
๐ The good things about LoRA are:
- It helps the computer learn without using too many helpers, so it's faster.
- The original helpers stay the same, so we can use them for different tasks.
- It can work with other tricks that make computers smarter.
- The computer works just as fast when using LoRA, so we don't have to wait.
So, LoRA is like a cool trick that helps computers learn better and faster without making them slow down. It's like having a superhero team of helpers inside the computer! โค๏ธ
JARVIS-1: Open-Ended Multi-task Agents with Memory-Augmented Multimodal Language Models
abs: arxiv.org/abs/2311.05997
project page: craftjarvis-jarvis1.github.io
"We introduce JARVIS-1, an open-ended agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe."
abs: arxiv.org/abs/2311.05997
project page: craftjarvis-jarvis1.github.io
"We introduce JARVIS-1, an open-ended agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe."
Finally, we have a hallucination leaderboard! ๐๐
Key Takeaways
๐ Not surprisingly, GPT-4 is the lowest.
๐ Open source LLama 2 70 is pretty competitive!
๐ Google's models are the lowest. Again, this is not surprising given that the #1 reason Bard is not usable is its high hallucination rate.
Really cool that we are beginning to do these evaluations and capture them in leaderboards!
Key Takeaways
๐ Not surprisingly, GPT-4 is the lowest.
๐ Open source LLama 2 70 is pretty competitive!
๐ Google's models are the lowest. Again, this is not surprising given that the #1 reason Bard is not usable is its high hallucination rate.
Really cool that we are beginning to do these evaluations and capture them in leaderboards!
This media is not supported in your browser
VIEW IN TELEGRAM
Its not only about LLMs......
๐ Microsoft Introduces Florence-2, a Breakthrough in Computer Vision!
๐ Microsoft has just unveiled Florence-2, a revolutionary foundation model designed for various computer vision and vision-language tasks. This new model simplifies the process by using one backbone for multiple tasks. Read more about it in the paper and project details provided below ๐
Key Highlights:
โ Achieves state-of-the-art performance in various tasks
โ Employs a unified, prompt-based representation for vision tasks
โ Features the FLD-5B dataset, boasting over 5 billion annotations with 126 million pictures
โ Handles detection, captioning, and groundingโall with a single model
โ Streamlined with a uniform set of parameters governing everything
๐ Microsoft Introduces Florence-2, a Breakthrough in Computer Vision!
๐ Microsoft has just unveiled Florence-2, a revolutionary foundation model designed for various computer vision and vision-language tasks. This new model simplifies the process by using one backbone for multiple tasks. Read more about it in the paper and project details provided below ๐
Key Highlights:
โ Achieves state-of-the-art performance in various tasks
โ Employs a unified, prompt-based representation for vision tasks
โ Features the FLD-5B dataset, boasting over 5 billion annotations with 126 million pictures
โ Handles detection, captioning, and groundingโall with a single model
โ Streamlined with a uniform set of parameters governing everything
How big do LLMs need to be able to reason?๐ค Microsoft released Orca 2 this week, a 13B Llama-based LLM trained on complex tasks and reasoning. ๐ง Orca's performance comes from its use of synthetically generated data from bigger LLMs. I took a deeper look at paper and extracted the implementation and other insights.
๐๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป:
1๏ธโฃ Constructed a new dataset (Orca 2) with ~817K samples using prompts from FLAN, and GPT-4 to generate reasoning responses with the help of detailed system prompts.
2๏ธโฃ Grouped prompts into categories based on similarity to assign tailored system prompt that demonstrate different reasoning techniques.
3๏ธโฃ Replaced the original system prompt with a more generic one, to have the model learn the underlying reasoning strategy (Prompt erasing).
4๏ธโฃ Used progressive learning, starting with finetune Llama on FLAN-v2 (1 ep) , retrain on 5M ChatGPT data from Orca 1 (3 ep), combine 1M GPT-4 data from Orca 1 & 800k new Orca 2 data for final training (4 ep).
๐๐ป๐๐ถ๐ด๐ต๐๐:
๐ Imitation learning can improve capabilities with enough data.
๐ฌ Reasoning and longer generations to get the correct answer help smaller models to compete with bigger LLMs.
๐ซ Prompt Erasing helped Orca to โlearnโ reasoning
๐ฏ Lowest hallucination rates of comparable models on summarization
โ๏ธ Used packing for training, concatenating multiple examples into one sequence.
๐จโ๐ฆฏ Masked user & system inputs (prompt) and only used generation for loss
๐ฅ Trained on 32 A100 for 80h
Paper: https://huggingface.co/papers/2311.11045
Model: https://huggingface.co/microsoft/Orca-2-13b
๐๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป:
1๏ธโฃ Constructed a new dataset (Orca 2) with ~817K samples using prompts from FLAN, and GPT-4 to generate reasoning responses with the help of detailed system prompts.
2๏ธโฃ Grouped prompts into categories based on similarity to assign tailored system prompt that demonstrate different reasoning techniques.
3๏ธโฃ Replaced the original system prompt with a more generic one, to have the model learn the underlying reasoning strategy (Prompt erasing).
4๏ธโฃ Used progressive learning, starting with finetune Llama on FLAN-v2 (1 ep) , retrain on 5M ChatGPT data from Orca 1 (3 ep), combine 1M GPT-4 data from Orca 1 & 800k new Orca 2 data for final training (4 ep).
๐๐ป๐๐ถ๐ด๐ต๐๐:
๐ Imitation learning can improve capabilities with enough data.
๐ฌ Reasoning and longer generations to get the correct answer help smaller models to compete with bigger LLMs.
๐ซ Prompt Erasing helped Orca to โlearnโ reasoning
๐ฏ Lowest hallucination rates of comparable models on summarization
โ๏ธ Used packing for training, concatenating multiple examples into one sequence.
๐จโ๐ฆฏ Masked user & system inputs (prompt) and only used generation for loss
๐ฅ Trained on 32 A100 for 80h
Paper: https://huggingface.co/papers/2311.11045
Model: https://huggingface.co/microsoft/Orca-2-13b
huggingface.co
Paper page - Orca 2: Teaching Small Language Models How to Reason
Join the discussion on this paper page
Think of an LLM that can find entities in a given image, describe the image and answers questions about it, without hallucinating โจ
Kosmos-2 released by Microsoft is a very underrated model that can do that. โ๏ธ Not only this, but Hugging Face transformers integration makes it super easy to use!
Colab link:
https://colab.research.google.com/drive/1t25qM_lOM-HQG6Wg3aRiF4LOuQMN5lUF?usp=sharing
Kosmos-2 released by Microsoft is a very underrated model that can do that. โ๏ธ Not only this, but Hugging Face transformers integration makes it super easy to use!
Colab link:
https://colab.research.google.com/drive/1t25qM_lOM-HQG6Wg3aRiF4LOuQMN5lUF?usp=sharing
Retrieval-Augmented Generation for Large Language Models: A survey
This paper is a must read.
It covers everything you need to know about the RAG framework and its limitations. It also lists different state-of-the-art techniques to boost its performance in retrieval, augmentation, and generation.
The ultimate goal behind these techniques is to make this framework ready for scalability and production use, especially for use cases and industries where answer quality matters *a lot*.
These are the key ideas the paper discusses to make your RAG more efficient:
- ๐๏ธ Enhance the quality of indexed data by removing duplicate/redundant information and adding mechanisms to refresh outdated documents
- ๐ ๏ธ Optimize index structure by determining the right chunk size through quantitative evaluation
- ๐ท๏ธ Add metadata (e.g. date, chapters, or subsection) to the indexed documents to incorporate filtering functionalities that enhance efficiency and relevance
- โ๏ธ Align the input query with the documents by indexing the chunks of data by the questions they answer
- ๐ Mixed retrieval: combine different search techniques like keyword-based and semantic search
- ๐ ReRank: sort the retrieved documents to maximize diversity and optimize the similarity with a ยซ template answer ยป
- ๐๏ธ Prompt compression: remove irrelevant context
- ๐ก HyDE: generate a hypothetical answer to the input question and use it (with the query) to improve the search
- โ๏ธ Query rewrite and expansion to reformulate the userโs intent and remove ambiguity
Link: https://arxiv.org/abs/2312.10997
This paper is a must read.
It covers everything you need to know about the RAG framework and its limitations. It also lists different state-of-the-art techniques to boost its performance in retrieval, augmentation, and generation.
The ultimate goal behind these techniques is to make this framework ready for scalability and production use, especially for use cases and industries where answer quality matters *a lot*.
These are the key ideas the paper discusses to make your RAG more efficient:
- ๐๏ธ Enhance the quality of indexed data by removing duplicate/redundant information and adding mechanisms to refresh outdated documents
- ๐ ๏ธ Optimize index structure by determining the right chunk size through quantitative evaluation
- ๐ท๏ธ Add metadata (e.g. date, chapters, or subsection) to the indexed documents to incorporate filtering functionalities that enhance efficiency and relevance
- โ๏ธ Align the input query with the documents by indexing the chunks of data by the questions they answer
- ๐ Mixed retrieval: combine different search techniques like keyword-based and semantic search
- ๐ ReRank: sort the retrieved documents to maximize diversity and optimize the similarity with a ยซ template answer ยป
- ๐๏ธ Prompt compression: remove irrelevant context
- ๐ก HyDE: generate a hypothetical answer to the input question and use it (with the query) to improve the search
- โ๏ธ Query rewrite and expansion to reformulate the userโs intent and remove ambiguity
Link: https://arxiv.org/abs/2312.10997
arXiv.org
Retrieval-Augmented Generation for Large Language Models: A Survey
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes....
Good material on Developing AI systems in Medical Imaging: https://aiformedicalimaging.blogspot.com/2023/09/things-to-consider-when-developing-ai.html
Blogspot
A Guide to Developing AI systems in Medical Imaging
aiformedicalimaging, thermal imaging
Microsoft just casually shared their new Phi-3 LLMs less than a week after Llama 3 release. Based on the benchmarks in technical report (https://arxiv.org/abs/2404.14219), even the smallest Phi-3 model beats Llama 3 8B despite being less than half the size.
Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)
Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.
Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires โ 1.8GB of memory.)
What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".
Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.
Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.
Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion)
Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B.
Despite being small enough to be deployed on a phone (according to report), it matches the performance of the much larger Mixtral 8x7B and GPT-3.5. (Phi-3 mini can be quantized to 4-bits, so it only requires โ 1.8GB of memory.)
What is the secret sauce? According to the technical report, it's dataset quality over quantity: "heavily filtered web data and synthetic data".
Next to the 4k context-window version, there's also a phi-3-mini-128K model that supports up to 128k tokens.
Fun fact: Phi-3 uses the same tokenizer with a vocabulary size of 32,064 as Llama 2.
Forwarded from Artificial Intelligence
Ai / Computer Vision Bootcamp๐
Learn Ai / Computer Vision from Basics to Deployment from IITian and COEPian.
โ Build Face Recognition โบ๏ธ
โ Build Ai Object detection ๐๏ธ๐โ๏ธ
โ Building Social Distancing App
โ Build Automated Invoice reader ๐
โ Image Classification ๐๐ฅ
โ Build Application of Computer Vision in Healthcare, Automotive, retail, Manufacturing and Security, Surveillance. ๐ธ
+40 Hrs sessions.
+12 Weeks.
+13 Tools & Technology.
+7 Projects.
+7 Homework Assignments.
+5 Case studies
+5 Skills.
+5 Domains.
๐ Remote and weekend sessions.
๐ Starting from basics.
๐ Get Certificate.
Duration: 3 months
Attend the 1st FREE session on 11th May: https://chat.whatsapp.com/BibIwuuUEWrGEWdZHYluNe
For registrations: https://aiindia.ai/cv-bootcamp/
Learn Ai / Computer Vision from Basics to Deployment from IITian and COEPian.
โ Build Face Recognition โบ๏ธ
โ Build Ai Object detection ๐๏ธ๐โ๏ธ
โ Building Social Distancing App
โ Build Automated Invoice reader ๐
โ Image Classification ๐๐ฅ
โ Build Application of Computer Vision in Healthcare, Automotive, retail, Manufacturing and Security, Surveillance. ๐ธ
+40 Hrs sessions.
+12 Weeks.
+13 Tools & Technology.
+7 Projects.
+7 Homework Assignments.
+5 Case studies
+5 Skills.
+5 Domains.
๐ Remote and weekend sessions.
๐ Starting from basics.
๐ Get Certificate.
Duration: 3 months
Attend the 1st FREE session on 11th May: https://chat.whatsapp.com/BibIwuuUEWrGEWdZHYluNe
For registrations: https://aiindia.ai/cv-bootcamp/
WhatsApp.com
WhatsApp Group Invite
Early result of Gemma 2 on the leaderboard, matching Llama-3-70B.
- Full data at leaderboard.lmsys.org
- Chat with Gemma 2 at chat.lmsys.org
- Gemma 2 blog goo.gle/3RLQXUa
- Full data at leaderboard.lmsys.org
- Chat with Gemma 2 at chat.lmsys.org
- Gemma 2 blog goo.gle/3RLQXUa
The matrix calculus for Deep Learning. Very well written. https://explained.ai/matrix-calculus/
Best article on GenAI getting started
https://blog.bytebytego.com/p/where-to-get-started-with-genai
https://blog.bytebytego.com/p/where-to-get-started-with-genai
Bytebytego
Where to get started with GenAI
How to monitor AWS container environments at scale (Sponsored) In this eBook, Datadog and AWS share insights into the changing state of containers in the cloud and explore why orchestration technologies are an essential part of managing ever-changing containerizedโฆ
How Much GPU Memory Needed To Server A LLM ?
This is a common question that consistnetly comes up in interview or during the disscusiion with your business stakeholders.
And itโs not just a random question โ itโs a key indicator of how well you understand the deployment and scalability of these powerful models in production.
As a data scientist understanding and estimating the require GPU memory is essential.
LLM's (Large Language Models) size vary from 7 billion parameters to trillions of parameters. One size certainly doesnโt fit all.
Letโs dive into the math that will help you estimate the GPU memory needed for deploying these models effectively.
๐๐ก๐ ๐๐จ๐ซ๐ฆ๐ฎ๐ฅ๐ ๐ญ๐จ ๐๐ฌ๐ญ๐ข๐ฆ๐๐ญ๐ ๐๐๐ ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐ข๐ฌ
General formula, ๐ฆ = ((๐ * ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ)/๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ) * ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ
Where:
- ๐ฆ is the GPU memory in Gigabytes.
- ๐ฉ is the number of parameters in the model.
- ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ typically refers to the bytes needed for each model parameter, which is typically 4 bytes for float32 precision.
- ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ (q) refer to the number of bits typically processed in parallel, such as 32 bits for a typical GPU memory channel.
- ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ is often applied (e.g., 1.2) to account for additional memory needed beyond just storing parameters, such as activations, temporary tensors, and any memory fragmentation or padding.
๐๐ข๐ฆ๐ฉ๐ฅ๐ข๐๐ข๐๐ ๐ ๐จ๐ซ๐ฆ๐ฎ๐ฅ๐:
M = ((P * 4B)/(32/Q)) * 1.2
With this formula in hand, I hope you'll feel more confident when discussing GPU memory requirements with your business stakeholders.
#LLM
This is a common question that consistnetly comes up in interview or during the disscusiion with your business stakeholders.
And itโs not just a random question โ itโs a key indicator of how well you understand the deployment and scalability of these powerful models in production.
As a data scientist understanding and estimating the require GPU memory is essential.
LLM's (Large Language Models) size vary from 7 billion parameters to trillions of parameters. One size certainly doesnโt fit all.
Letโs dive into the math that will help you estimate the GPU memory needed for deploying these models effectively.
๐๐ก๐ ๐๐จ๐ซ๐ฆ๐ฎ๐ฅ๐ ๐ญ๐จ ๐๐ฌ๐ญ๐ข๐ฆ๐๐ญ๐ ๐๐๐ ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐ข๐ฌ
General formula, ๐ฆ = ((๐ * ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ)/๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ) * ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ
Where:
- ๐ฆ is the GPU memory in Gigabytes.
- ๐ฉ is the number of parameters in the model.
- ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ typically refers to the bytes needed for each model parameter, which is typically 4 bytes for float32 precision.
- ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ (q) refer to the number of bits typically processed in parallel, such as 32 bits for a typical GPU memory channel.
- ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ is often applied (e.g., 1.2) to account for additional memory needed beyond just storing parameters, such as activations, temporary tensors, and any memory fragmentation or padding.
๐๐ข๐ฆ๐ฉ๐ฅ๐ข๐๐ข๐๐ ๐ ๐จ๐ซ๐ฆ๐ฎ๐ฅ๐:
M = ((P * 4B)/(32/Q)) * 1.2
With this formula in hand, I hope you'll feel more confident when discussing GPU memory requirements with your business stakeholders.
#LLM
Uber used RAG and AI agents to build its in-house Text-to-SQL, saving 140,000 hours annually in query writing time. ๐
Hereโs how they built the system end-to-end:
The system is called QueryGPT and is built on top of multiple agents each handling a part of the pipeline.
1. First, the Intent Agent interprets user intent and figures out the domain workspace which is relevant to answer the question (e.g., Mobility, Billing, etc).
2. The Table Agent then selects suitable tables using an LLM, which users can also review and adjust.
3. Next, the Column Prune Agent filters out any unnecessary columns from large tables using RAG. This helps the schema fit within token limits.
4. Finally, QueryGPT uses Few-Shot Prompting with selected SQL samples and schemas to generate the query.
QueryGPT reduced query authoring time from 10 minutes to 3, saving over 140,000 hours annually!
Link to the full article: https://www.uber.com/en-IN/blog/query-gpt/?uclick_id=6cfc9a34-aa3e-4140-9e8e-34e867b80b2b
Hereโs how they built the system end-to-end:
The system is called QueryGPT and is built on top of multiple agents each handling a part of the pipeline.
1. First, the Intent Agent interprets user intent and figures out the domain workspace which is relevant to answer the question (e.g., Mobility, Billing, etc).
2. The Table Agent then selects suitable tables using an LLM, which users can also review and adjust.
3. Next, the Column Prune Agent filters out any unnecessary columns from large tables using RAG. This helps the schema fit within token limits.
4. Finally, QueryGPT uses Few-Shot Prompting with selected SQL samples and schemas to generate the query.
QueryGPT reduced query authoring time from 10 minutes to 3, saving over 140,000 hours annually!
Link to the full article: https://www.uber.com/en-IN/blog/query-gpt/?uclick_id=6cfc9a34-aa3e-4140-9e8e-34e867b80b2b
"Agents are not enough." New Microsoft research explores that for the latest wave of agents, differentiated by GenAI, to succeed, they need to work together with Sims and Assistants (see diagram on page 3):
Agents are nothing new, evolving from early agents (1950s) to expert systems (1980s), reactive agents (1990s), and more recently multi-agent systems and cognitive architectures.
While frameworks like AutoGen help modern agents tackle complex tasks in narrow domains, challenges like generalization, scalability, and coordination persist.
To help tackle challenges and improve standardization, privacy, personalization, and trust, the research advocates for an ecosystem centered on Agents, Sims, and Assistants.
1\ Agents:
โ Narrow and purpose-driven modules that are trained to do a specific task. Each agent can be autonomous, but with an ability to interface with other agents.
2\ Sims:
โ Representations of the user, built from their profile, preferences, and behaviors, capturing key aspects of who the user is.
โ Sims can act on the userโs behalf, interacting with agents to accomplish tasks, guided by the userโs Assistant.
3\ Assistants:
โ Programs that interact directly with users, deeply understand them, and can call Sims or Agents to handle tasks reactively or proactively.
โ Assistants act as private agents, accessing personal information and fine-tuned to the user, enabling them to perform tasks on the user's behalf.
Interaction
โ Agents, Sims, and Assistants work together with high degree of synergy.
โ The Assistant, deeply understanding the user, co-creates and manages Sims with user input, reflecting different facets of the userโs life.
โ Sims engage specialized Agents to complete tasks effectively, ensuring precision and personalization, which enhances user satisfaction.
P.S. Paper attached with link dives deeper: https://www.arxiv.org/pdf/2412.16241
Agents are nothing new, evolving from early agents (1950s) to expert systems (1980s), reactive agents (1990s), and more recently multi-agent systems and cognitive architectures.
While frameworks like AutoGen help modern agents tackle complex tasks in narrow domains, challenges like generalization, scalability, and coordination persist.
To help tackle challenges and improve standardization, privacy, personalization, and trust, the research advocates for an ecosystem centered on Agents, Sims, and Assistants.
1\ Agents:
โ Narrow and purpose-driven modules that are trained to do a specific task. Each agent can be autonomous, but with an ability to interface with other agents.
2\ Sims:
โ Representations of the user, built from their profile, preferences, and behaviors, capturing key aspects of who the user is.
โ Sims can act on the userโs behalf, interacting with agents to accomplish tasks, guided by the userโs Assistant.
3\ Assistants:
โ Programs that interact directly with users, deeply understand them, and can call Sims or Agents to handle tasks reactively or proactively.
โ Assistants act as private agents, accessing personal information and fine-tuned to the user, enabling them to perform tasks on the user's behalf.
Interaction
โ Agents, Sims, and Assistants work together with high degree of synergy.
โ The Assistant, deeply understanding the user, co-creates and manages Sims with user input, reflecting different facets of the userโs life.
โ Sims engage specialized Agents to complete tasks effectively, ensuring precision and personalization, which enhances user satisfaction.
P.S. Paper attached with link dives deeper: https://www.arxiv.org/pdf/2412.16241