ML Research Hub
32.8K subscribers
4.36K photos
267 videos
23 files
4.71K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Paper: https://arxiv.org/pdf/2408.01800v1.pdf

Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v

Datasets: Video-MME

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics

https://t.me/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
📚 Become a professional data scientist with these 17 resources!



1️⃣ Python libraries for machine learning

◀️ Introducing the best Python tools and packages for building ML models.



2️⃣ Deep Learning Interactive Book

◀️ Learn deep learning concepts by combining text, math, code, and images.



3️⃣ Anthology of Data Science Learning Resources

◀️ The best courses, books, and tools for learning data science.



4️⃣ Implementing algorithms from scratch

◀️ Coding popular ML algorithms from scratch



5️⃣ Machine Learning Interview Guide

◀️ Fully prepared for job interviews



6️⃣ Real-world machine learning projects

◀️ Learning how to build and deploy models.



7️⃣ Designing machine learning systems

◀️ How to design a scalable and stable ML system.



8️⃣ Machine Learning Mathematics

◀️ Basic mathematical concepts necessary to understand machine learning.



9️⃣ Introduction to Statistical Learning

◀️ Learn algorithms with practical examples.



1️⃣ Machine learning with a probabilistic approach

◀️ Better understanding modeling and uncertainty with a statistical perspective.



1️⃣ UBC Machine Learning

◀️ Deep understanding of machine learning concepts with conceptual teaching from one of the leading professors in the field of ML,



1️⃣ Deep Learning with Andrew Ng

◀️ A strong start in the world of neural networks, CNNs and RNNs.



1️⃣ Linear Algebra with 3Blue1Brown

◀️ Intuitive and visual teaching of linear algebra concepts.



🔴 Machine Learning Course

◀️ A combination of theory and practical training to strengthen ML skills.



1️⃣ Mathematical Optimization with Python

◀️ You will learn the basic concepts of optimization with Python code.



1️⃣ Explainable models in machine learning

◀️ Making complex models understandable.



⚫️ Data Analysis with Python

◀️ Data analysis skills using Pandas and NumPy libraries.


#DataScience #MachineLearning #DeepLearning #Python #AI #MLProjects #DataAnalysis #ExplainableAI #100DaysOfCode #TechEducation #MLInterviewPrep #NeuralNetworks #MathForML #Statistics #Coding #AIForEveryone #PythonForDataScience



⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
👍102
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
(INNER) JOIN: Returns records that have matching values in both tables.
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
2