ML Research Hub
32.9K subscribers
4.45K photos
273 videos
23 files
4.81K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
🦜 Toucan is an open-source TTS model with support for 7000 languages ​​and dialects

Toucan is a text-to-speech (TTS) model + a set of tools for learning, training and deploying the model.

The model was created at the Institute for Natural Language Processing (IMS) at the University of Stuttgart.

Everything is written in idiomatic Python using PyTorch to make learning and testing as easy as possible.

πŸ–₯ GitHub
πŸ€— Test it on HF
πŸ€— Dataset for HF

https://t.me/DataScienceT βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1πŸ†1
WebScraping with Gen AI

During this session, we'll explore the following topics:

1️⃣ Basics of Web Scraping:
Understand the fundamental concepts and techniques of web scraping and its legal and ethical considerations.

2️⃣ Scraping with Gen AI:
Discover how Gen AI revolutionizes the web scraping landscape with real-world examples.

3️⃣ Jina Reader API:
Get acquainted with the Jina Reader API, a powerful tool for obtaining LLM-friendly input from URLs or web searches.

4️⃣ ScrapeGraphAI:
Dive into ScrapeGraphAI, a groundbreaking Python library that combines LLMs and direct graph logic for creating robust scraping pipelines.

Event Details:
πŸ—“ Date: 22 June, Saturday
⏰ Time: 11:00 AM IST
πŸ”— Register now: https://www.buildfastwithai.com/events/web-scraping-with-gen-ai

Connect with Founder from IIT Delhi;
https://www.linkedin.com/in/satvik-paramkusham/
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘6❀‍πŸ”₯1πŸ†1
🟒 Let's start with Day 1 today

Let's learn Linear Regression in detail

Linear regression is a statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (features). The goal is to find the linear equation that best predicts the target variable from the feature variables.

The equation of a simple linear regression model is:
\[ y = \beta_0 + \beta_1 x \]
Where:
- \( y) is the predicted value.
- \( \beta_0) is the y-intercept.
- \( \beta_1) is the slope of the line (coefficient).
- \( x) is the independent variable.

βœ… Implementation

Let's consider an example using Python and its libraries.

βœ… Example
Suppose we have a dataset with house prices and their corresponding size (in square feet).

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Example data
data = {
    'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
    'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}
df = pd.DataFrame(data)

# Independent variable (feature) and dependent variable (target)
X = df[['Size']]
y = df['Price']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Creating and training the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Plotting the results
plt.scatter(X, y, color='blue')  # Original data points
plt.plot(X_test, y_pred, color='red', linewidth=2)  # Regression line
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Linear Regression: House Prices vs Size')
plt.show()

βœ… Explanation of the Code

1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
2. Data Preparation: We create a DataFrame containing the size and price of houses.
3. Feature and Target: We separate the feature (Size) and the target (Price).
4. Train-Test Split: We split the data into training and testing sets.
5. Model Training: We create a LinearRegression model and train it using the training data.
6. Predictions: We use the trained model to predict house prices for the test set.
7. Evaluation: We evaluate the model using Mean Squared Error (MSE) and R-squared (RΒ²) metrics.
8. Visualization: We plot the original data points and the regression line to visualize the model's performance.

βœ… Evaluation Metrics

- Mean Squared Error (MSE): Measures the average squared difference between the actual and predicted values. Lower values indicate better performance.
- R-squared (RΒ²): Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Values closer to 1 indicate a better fit.

πŸ‘‡ BEST DATA SCIENCE CHANNELS ON TELEGRAM πŸ‘‡

https://t.me/addlist/8_rRW2scgfRhOTc0
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘8πŸ†2
Forwarded from Eng. Hussein Sheikho
Delegate routine tasks to Artificial Intelligence!

The new assistant for macOS always knows what needs to be done!

πŸ’» The new AI-powered assistant AIDE provides you with support based on your screen content. Get relevant hints and solutions to work more efficiently and productively without losing focus on your tasks.

Download and start for free now β€” AIDE AI
πŸ‘2😍1πŸ†1
🌟 Modded-NanoGPT - allows you to achieve GPT-2 quality (124M) when training on only 5B tokens

Modded-NanoGPT is a modification of the GPT-2 training code from Andrei Karpathy.

Modded-NanoGPT allows:
- train 2 times more efficiently (requires only 5B tokens instead of 10B to achieve the same accuracy)
- has simpler code (446 lines instead of 858)

πŸ–₯ GitHub

πŸ‘‡ BEST DATA SCIENCE CHANNELS ON TELEGRAM πŸ‘‡

https://t.me/addlist/8_rRW2scgfRhOTc0
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
πŸ‘Œ Microsoft just, without a big announcement (again!), released an interesting new way to train models "Instruction Pre-Training, models and datasets.

When pre-trained from scratch, a 500M model trained on 100B tokens achieves the performance of a 1B model pre-trained on 300B tokens.

Available:
πŸ‘€ Datasets
πŸ¦™ Llama 3 8B with quality comparable to 70B!
❀️‍πŸ”₯ General models + specialized models (medicine/finance)

🟑abs: https://arxiv.org/abs/2406.14491
πŸ”΄models: https://huggingface.co/instruction-pretrain

πŸ‘‡ BEST DATA SCIENCE CHANNELS ON TELEGRAM πŸ‘‡

https://t.me/addlist/8_rRW2scgfRhOTc0
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘6
Β‘Hola! πŸ‘‹
AmigoChat - AI GPT bot. Best friend and assistant:

βœ… use GPT 4 Omni
βœ… generate images
βœ… get ideas and hashtags for social media
βœ… write SEO texts
βœ… rewrite and summarize longreads
βœ… choose a promotion plan
βœ… chat and ask questions

Everything is FREE because amigos don't take dineros for help! 🀠
πŸ‘‰ https://t.me/Amigoo_Chat_Bot
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
Consistency Models Made Easy

πŸ–₯ Github: https://github.com/locuslab/ect

πŸ“• Paper: https://arxiv.org/abs/2406.14548v1

πŸ”₯Dataset: https://paperswithcode.com/dataset/cifar-10
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ ExVideo is a tuning technique to improve the ability of models to generate video

ExVideo allows a model to generate 5 times more frames, while requiring only 1.5k GPU training hours on a dataset of 40k videos.

In particular, ExVideo was used to improve the Stable Video Diffusion model to generate long videos up to 128 frames.
The code, article and model are at the links below.

🟑 ExVideo page
πŸ–₯ GitHub
🟑 Hugging Face
🟑 Arxiv

πŸ‘‡ BEST DATA SCIENCE CHANNELS ON TELEGRAM πŸ‘‡

https://t.me/addlist/8_rRW2scgfRhOTc0
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘6
🌟 MG-LLaVA - multimodal LLM with advanced capabilities for working with visual information

Just recently, the guys from Shanghai University rolled out MG-LLaVA - MLLM, which expands the capabilities of processing visual information through the use of additional components: special components that are responsible for working with low and high resolution.

MG-LLaVA integrates an additional high-resolution visual encoder to capture fine details, which are then combined with underlying visual features using the Conv-Gate network.

Trained exclusively on publicly available multimodal data, MG-LLaVA achieves excellent results.

🟑 MG-LLaVA page
πŸ–₯ GitHub

https://t.me/DataScienceT βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘4
πŸ–₯ Unstructured - Python library for raw data preprocessing

- pip install "unstructured[all-docs]"

Unstructured provides components for preprocessing images and text documents; supports many formats: PDF, HTML, Word docs, etc.

Run the library in a container:
docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest
docker exec -it unstructured bash


πŸ–₯ GitHub
🟑 Docks

https://t.me/DataScienceT βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘8❀2
Forwarded from Data Science Premium (Books & Courses)
🟒 We present to you a list of our paid services:

βœ… Paid Channel (Books): It includes a huge encyclopedia of free channels and a huge collection of important and rare books

πŸ’΅ Price: 7$ - one time payment

βœ… Paid channel (courses): includes a complete set of courses downloaded from Udemy, Coursera, and other learning platforms.

πŸ’΅ Price: 20$ - one time payment

βœ… Paid bot 1: A bot that contains ten million books and articles in all fields of programming, data science, life sciences, and medicine, with daily updates and additions.

πŸ’΅ Price: 10$ - one time payment

βœ… Paid bot 2: The bot contains more than 30 million books and 10 million scientific articles, with your own account, your own control panel, and important features.

πŸ’΅ Price: 17$ - one time payment

βœ… Coursera Scholarship: This scholarship gives you free access to all Coursera courses

πŸ’΅ Price: 30$ - one time payment

πŸ”₯ Special offer: Get all four services for only $50

πŸ’ πŸ’  Available payment methods:
PayPal - Payeer - Crypto - udst
MasterCard - Credit Card

To request a subscription:
t.me/Hussein_Sheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†’ EAGLE is a method that allows you to generate LLM responses faster

Is it possible to generate LLM response on two RTX 3060s faster than on A100 (which is 16+ times more expensive)?
Yes, it is possible with EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency) and the accuracy of the responses is preserved.

EAGLE allows you to extrapolate the context feature vectors of the second top layer of the LLM, which greatly improves the generation efficiency.

EAGLE is 2x faster than Lookahead (13B), and 1.6x faster than Medusa (13B).
And yes, EAGLE can be combined with other acceleration techniques like vLLM, DeepSpeed, Mamba, FlashAttention, quantization and hardware optimization.

πŸ€— Hugging Face
πŸ’» GitHub
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘1
Forwarded from Data Science Premium (Books & Courses)
Building Transformer Models with Attention (2023)

The number one book for learning transformers from beginners to professionalism. The book provides an interesting explanation of transformers. The issue is available to the first twenty people only.

Price: 4$

PayPal / Credit Card: https://www.patreon.com/DataScienceBooks/shop/building-transformer-models-with-2023-253971

Crypto Payment: http://t.me/send?start=IVq3O4aPlSWF
πŸ‘2