ML Research Hub
32.8K subscribers
4.3K photos
260 videos
23 files
4.65K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ”Ή Title: EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

πŸ”Ή Publication Date: Published on Oct 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.25628
β€’ PDF: https://arxiv.org/pdf/2510.25628
β€’ Github: https://github.com/MAGIC-AI4Med/EHR-R1

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
A title

I guess this is a start
https://t.me/frommeforworld



Sponsored By WaybienAds
πŸ”Ή Title: The Era of Agentic Organization: Learning to Organize with Language Models

πŸ”Ή Publication Date: Published on Oct 30

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.26658
β€’ PDF: https://arxiv.org/pdf/2510.26658

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

πŸ”Ή Publication Date: Published on Oct 30

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.26213
β€’ PDF: https://arxiv.org/pdf/2510.26213

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
❀1
πŸ”Ή Title: Exploring Conditions for Diffusion models in Robotic Control

πŸ”Ή Publication Date: Published on Oct 17

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.15510
β€’ PDF: https://arxiv.org/pdf/2510.15510
β€’ Project Page: https://orca-rc.github.io/
β€’ Github: https://orca-rc.github.io/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: ChartAB: A Benchmark for Chart Grounding & Dense Alignment

πŸ”Ή Publication Date: Published on Oct 30

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.26781
β€’ PDF: https://arxiv.org/pdf/2510.26781
β€’ Project Page: https://huggingface.co/datasets/umd-zhou-lab/ChartAlignBench
β€’ Github: https://github.com/tianyi-lab/ChartAlignBench

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

πŸ”Ή Publication Date: Published on Oct 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.25897
β€’ PDF: https://arxiv.org/pdf/2510.25897
β€’ Project Page: https://nicolas-dufour.github.io/miro/
β€’ Github: https://nicolas-dufour.github.io/miro/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
Forwarded from Kaggle Data Hub
Is Your Crypto Transfer Secure?

Score Your Transfer analyzes wallet activity, flags risky transactions in real time, and generates downloadable compliance reportsβ€”no technical skills needed. Protect funds & stay compliant.



Sponsored By WaybienAds
πŸ”Ή Title: Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

πŸ”Ή Publication Date: Published on Oct 22

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.19949
β€’ PDF: https://arxiv.org/pdf/2510.19949

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs

πŸ”Ή Publication Date: Published on Oct 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.25364
β€’ PDF: https://arxiv.org/pdf/2510.25364

πŸ”Ή Datasets citing this paper:
β€’ https://huggingface.co/datasets/colinglab/CLASS_IT

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: The End of Manual Decoding: Towards Truly End-to-End Language Models

πŸ”Ή Publication Date: Published on Oct 30

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.26697
β€’ PDF: https://arxiv.org/pdf/2510.26697
β€’ Github: https://github.com/Zacks917/AutoDeco

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

πŸ”Ή Publication Date: Published on Oct 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.25867
β€’ PDF: https://arxiv.org/pdf/2510.25867
β€’ Project Page: https://ucsc-vlaa.github.io/MedVLSynther/
β€’ Github: https://ucsc-vlaa.github.io/MedVLSynther/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

πŸ”Ή Publication Date: Published on Oct 25

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.22282
β€’ PDF: https://arxiv.org/pdf/2510.22282
β€’ Github: https://github.com/tsinghua-fib-lab/CityRiSE

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: PORTool: Tool-Use LLM Training with Rewarded Tree

πŸ”Ή Publication Date: Published on Oct 29

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.26020
β€’ PDF: https://arxiv.org/pdf/2510.26020

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

πŸ”Ή Publication Date: Published on Oct 23

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.20976
β€’ PDF: https://arxiv.org/pdf/2510.20976

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
πŸ”Ή Title: Performance Trade-offs of Optimizing Small Language Models for E-Commerce

πŸ”Ή Publication Date: Published on Oct 24

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.21970
β€’ PDF: https://arxiv.org/pdf/2510.21970

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
❀1
πŸ”Ή Title: POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

πŸ”Ή Publication Date: Published on Oct 28

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2510.24992
β€’ PDF: https://arxiv.org/pdf/2510.24992

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.me/DataScienceT
❀2
nature papers: 2000$

Q1 and  Q2 papers    1000$

Q3 and Q4 papers   500$

Doctoral thesis (complete)    700$

M.S thesis         300$

paper simulation   200$

Contact me @husseinsheikho
❀2
ML Research Hub pinned Β«nature papers: 2000$ Q1 and  Q2 papers    1000$ Q3 and Q4 papers   500$ Doctoral thesis (complete)    700$ M.S thesis         300$ paper simulation   200$ Contact me @husseinsheikhoΒ»
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
β€’ DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
β€’ TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
β€’ DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
β€’ WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
β€’ HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
β€’ (INNER) JOIN: Returns records that have matching values in both tables.
β€’ LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
β€’ RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
β€’ FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
β€’ SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
β€’ A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
β€’ A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
❀2
WITH DepartmentSales AS (
SELECT department, SUM(sale_amount) as total_sales
FROM sales
GROUP BY department
)
SELECT department, total_sales
FROM DepartmentSales
WHERE total_sales > 100000;

---
#11. Difference between UNION and UNION ALL?
A:
β€’ UNION combines the result sets of two or more SELECT statements and removes duplicate rows.
β€’ UNION ALL also combines result sets but includes all rows, including duplicates. It is faster because it doesn't check for duplicates.

#12. How would you find the total number of employees in each department?
A: Use COUNT() with GROUP BY.

SELECT department, COUNT(employee_id) as number_of_employees
FROM employees
GROUP BY department;


#13. What is the difference between RANK() and DENSE_RANK()?
A:
β€’ RANK() assigns a rank to each row within a partition. If there are ties, it skips the next rank(s). (e.g., 1, 2, 2, 4)
β€’ DENSE_RANK() also assigns ranks, but it does not skip any ranks in case of ties. (e.g., 1, 2, 2, 3)

#14. Write a query to get the Nth highest salary.
A: Use DENSE_RANK() in a CTE.

WITH SalaryRanks AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
)
SELECT salary
FROM SalaryRanks
WHERE rnk = 5; -- For the 5th highest salary


#15. What is COALESCE() used for?
A: The COALESCE() function returns the first non-NULL value in a list of expressions. It's useful for providing default values for nulls.

SELECT name, COALESCE(commission, 0) as commission
FROM employees; -- Replaces NULL commissions with 0

---
#16. How would you select all employees whose name starts with 'A'?
A: Use the LIKE operator with a wildcard (%).

SELECT name
FROM employees
WHERE name LIKE 'A%';


#17. Get the current date and time.
A: This is function-dependent on the SQL dialect.
β€’ PostgreSQL/MySQL: NOW()
β€’ SQL Server: GETDATE()

SELECT NOW();


#18. How can you extract the month from a date?
A: Use the EXTRACT function or MONTH().

-- Standard SQL
SELECT EXTRACT(MONTH FROM '2023-10-27');
-- MySQL
SELECT MONTH('2023-10-27');


#19. What is a subquery? What are the types?
A: A subquery is a query nested inside another query.
β€’ Scalar Subquery: Returns a single value (one row, one column).
β€’ Multi-row Subquery: Returns multiple rows.
β€’ Correlated Subquery: An inner query that depends on the outer query for its values. It is evaluated once for each row processed by the outer query.

#20. Write a query to find all employees who work in the 'Sales' department.
A: Use a JOIN or a subquery.

-- Using JOIN (preferred)
SELECT e.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Sales';

---
#21. How would you calculate the month-over-month growth rate of sales?
A: Use the LAG() window function to get the previous month's sales and then apply the growth formula.

WITH MonthlySales AS (
SELECT
DATE_TRUNC('month', order_date)::DATE as sales_month,
SUM(sale_amount) as total_sales
FROM sales
GROUP BY 1
)
SELECT
sales_month,
total_sales,
(total_sales - LAG(total_sales, 1) OVER (ORDER BY sales_month)) / LAG(total_sales, 1) OVER (ORDER BY sales_month) * 100 as growth_rate
FROM MonthlySales;