@Codingdidi
9.18K subscribers
26 photos
7 videos
47 files
260 links
Free learning Resources For Data Analysts, Data science, ML, AI, GEN AI and Job updates, career growth, Tech updates
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸšจHere's an opportunity for youโš ๏ธ

*Webinar highlights:-*

โœ“ Data acquisition
โœ“ Data cleaning
โœ“ Data analysis
โœ“ Data visualization
โœ“ Dashboard creation
โœ“ Story creation

*Tools that will be covered in the webinar:-*

๐Ÿ‘‰๐ŸปPython
๐Ÿ‘‰๐ŸปMysql
๐Ÿ‘‰๐ŸปPowerbi

*Goal of the webinar:-*
โœ…A complete data analyst project.
โœ…A good decision maker.
โœ…Practical real world project.


*ADD ONS*:-

- Pandas notes ๐Ÿ—’๏ธ which is worth rupees 299/-
- Statistics notes
- โ Power BI guide
- โ Data analysis notes
- โ Sql notes
- โ Power Bi 7 days live session access, early bird access.

*Highlights*:-
- 3 hours live session.
- access to the recording for 2 months


*Here's how you can enroll!!*
- Pay 249/- INR
- Fill out the Google form.

*Date:-* 22nd sept 2024
*Timing:-* 7 pm - 10 pm IST
๐Ÿ‘6โค2
Thinking of starting FREE Python live sessions on zoom in Hindi.

What do you guys think ๐Ÿค”?
Anonymous Poll
94%
Exicted
6%
No not ๐Ÿšซ
๐Ÿ‘2
Hereโ€™s the link for the pdf of *PYTHON hand written notes* :-

https://drive.google.com/file/d/1wBEz2Nt9s3pjIRdRIxZpUrwclX8Lt-hg/view?usp=drivesdk

Donโ€™t forget to thank me in the comments.
๐Ÿ‘4โค2
Alert ๐Ÿšจ ๐Ÿ˜ฒ

Many people reached out to me saying telegram may get banned in their countries. So I've decided to create a WhatsApp channel ๐Ÿ‘‡๐Ÿ‘‡


Follow the CODING DIDI channel on WhatsApp:

https://whatsapp.com/channel/0029VaiVMpH2kNFyMWeMDV2Z

Donโ€™t worry Guys your contact number will stay hidden!

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
๐Ÿ‘3
10 commonly asked data science interview questions along with their answers

1๏ธโƒฃ What is the difference between supervised and unsupervised learning?
Supervised learning involves learning from labeled data to predict outcomes while unsupervised learning involves finding patterns in unlabeled data.

2๏ธโƒฃ Explain the bias-variance tradeoff in machine learning.
The bias-variance tradeoff is a key concept in machine learning. Models with high bias have low complexity and over-simplify, while models with high variance are more complex and over-fit to the training data. The goal is to find the right balance between bias and variance.

3๏ธโƒฃ What is the Central Limit Theorem and why is it important in statistics?
The Central Limit Theorem (CLT) states that the sampling distribution of the sample means will be approximately normally distributed regardless of the underlying population distribution, as long as the sample size is sufficiently large. It is important because it justifies the use of statistics, such as hypothesis testing and confidence intervals, on small sample sizes.

4๏ธโƒฃ Describe the process of feature selection and why it is important in machine learning.
Feature selection is the process of selecting the most relevant features (variables) from a dataset. This is important because unnecessary features can lead to over-fitting, slower training times, and reduced accuracy.

5๏ธโƒฃ What is the difference between overfitting and underfitting in machine learning? How do you address them?
Overfitting occurs when a model is too complex and fits the training data too well, resulting in poor performance on unseen data. Underfitting occurs when a model is too simple and cannot fit the training data well enough, resulting in poor performance on both training and unseen data. Techniques to address overfitting include regularization and early stopping, while techniques to address underfitting include using more complex models or increasing the amount of input data.

6๏ธโƒฃ What is regularization and why is it used in machine learning?
Regularization is a technique used to prevent overfitting in machine learning. It involves adding a penalty term to the loss function to limit the complexity of the model, effectively reducing the impact of certain features.

7๏ธโƒฃ How do you handle missing data in a dataset?
Handling missing data can be done by either deleting the missing samples, imputing the missing values, or using models that can handle missing data directly.

8๏ธโƒฃ What is the difference between classification and regression in machine learning?
Classification is a type of supervised learning where the goal is to predict a categorical or discrete outcome, while regression is a type of supervised learning where the goal is to predict a continuous or numerical outcome.

9๏ธโƒฃ Explain the concept of cross-validation and why it is used.
Cross-validation is a technique used to evaluate the performance of a machine learning model. It involves spliting the data into training and validation sets, and then training and evaluating the model on multiple such splits. Cross-validation gives a better idea of the model's generalization ability and helps prevent over-fitting.

๐Ÿ”Ÿ What evaluation metrics would you use to evaluate a binary classification model?
Some commonly used evaluation metrics for binary classification models are accuracy, precision, recall, F1 score, and ROC-AUC. The choice of metric depends on the specific requirements of the problem.


Like if you need similar content ๐Ÿ˜„๐Ÿ‘

Hope this helps you ๐Ÿ˜Š
๐Ÿ‘10โค2
Company name :Intent Sourcer
Job role :Data Analyst Trainee
Job type : Internship Entry level
Job Location :Nashik Division
Qualifications

Bachelor's degree or equivalent experience
Expertise with SPSS, Excel, and PowerPoint
Previous quantitative and qualitative research experience
Fresher: Less than 1 year
โ‚น 15K - โ‚น 20K (Per Month)
๐Ÿ‘2
@Codingdidi
https://www.linkedin.com/company/intent-sourcer/
Check out the LinkedIn ๐Ÿ”— profile link
Those who are interested, ping me on whatsapp at +91-9910986344

Limited seats batch โœ…
๐Ÿ‘6
Python project-based interview questions for a data analyst role, along with tips and sample answers [Part-1]

1. Data Cleaning and Preprocessing
- Question: Can you walk me through the data cleaning process you followed in a Python-based project?
- Answer: In my project, I used Pandas for data manipulation. First, I handled missing values by imputing them with the median for numerical columns and the most frequent value for categorical columns using fillna(). I also removed outliers by setting a threshold based on the interquartile range (IQR). Additionally, I standardized numerical columns using StandardScaler from Scikit-learn and performed one-hot encoding for categorical variables using Pandas' get_dummies() function.
- Tip: Mention specific functions you used, like dropna(), fillna(), apply(), or replace(), and explain your rationale for selecting each method.

2. Exploratory Data Analysis (EDA)
- Question: How did you perform EDA in a Python project? What tools did you use?
- Answer: I used Pandas for data exploration, generating summary statistics with describe() and checking for correlations with corr(). For visualization, I used Matplotlib and Seaborn to create histograms, scatter plots, and box plots. For instance, I used sns.pairplot() to visually assess relationships between numerical features, which helped me detect potential multicollinearity. Additionally, I applied pivot tables to analyze key metrics by different categorical variables.
- Tip: Focus on how you used visualization tools like Matplotlib, Seaborn, or Plotly, and mention any specific insights you gained from EDA (e.g., data distributions, relationships, outliers).

3. Pandas Operations
- Question: Can you explain a situation where you had to manipulate a large dataset in Python using Pandas?
- Answer: In a project, I worked with a dataset containing over a million rows. I optimized my operations by using vectorized operations instead of Python loops. For example, I used apply() with a lambda function to transform a column, and groupby() to aggregate data by multiple dimensions efficiently. I also leveraged merge() to join datasets on common keys.
- Tip: Emphasize your understanding of efficient data manipulation with Pandas, mentioning functions like groupby(), merge(), concat(), or pivot().

4. Data Visualization
- Question: How do you create visualizations in Python to communicate insights from data?
- Answer: I primarily use Matplotlib and Seaborn for static plots and Plotly for interactive dashboards. For example, in one project, I used sns.heatmap() to visualize the correlation matrix and sns.barplot() for comparing categorical data. For time-series data, I used Matplotlib to create line plots that displayed trends over time. When presenting the results, I tailored visualizations to the audience, ensuring clarity and simplicity.
- Tip: Mention the specific plots you created and how you customized them (e.g., adding labels, titles, adjusting axis scales). Highlight the importance of clear communication through visualization.

Like this post if you want next part of this interview series ๐Ÿ‘โค๏ธ


Hope it helps :)
๐Ÿ‘20โค1
Ping me on whatsapp at +91-9910986344
Media is too big
VIEW IN TELEGRAM
Data Analytics with python.

Starting date:- 10th oct 2024
๐Ÿ‘4โค1
Here are 25 most common Deep Learning interview questions for ML research positions:

Fundamentals:
- What is deep learning, and how does it differ from traditional machine learning?
- What is an activation function, and why is it important? Explain three types of activation functions.
- You are using a deep neural network for prediction, but it overfits the training data. What can you do to reduce overfitting?
- What is the vanishing gradient problem in neural networks, and how can it be fixed?
- Explain the process of backpropagation.

Neural Network Architectures:
- Describe the architecture of a typical Convolutional Neural Network (CNN).
- What are Autoencoders, and what are three practical uses of them?
- What is a transformer architecture, and how is it used in NLP tasks?
- What is the role of pooling layers in CNNs?
- What are Recurrent Neural Networks (RNNs), and where are they used?

Training and Optimization:
- How does L1/L2 regularization affect a neural network?
- Why should we use Batch Normalization?
- How do you know if your model is suffering from exploding gradients?
- What is the purpose of dropout in neural networks, and how does it affect training?
- What are some hyperparameters used in training neural networks?

Advanced Topics:
- What are the main gates in LSTM networks, and what are their tasks?
- Explain how self-attention works in transformers.
- Can CNNs be used to classify 1D signals?
- What is transfer learning, and when is it recommended or not?
- How do depthwise separable convolutions improve CNNs?

Practical Implementation:
- Describe the process of pre-training and fine-tuning in transformers.
- What are the main challenges when training a deep learning model with limited data?
- How do you handle class imbalance in deep learning?
- What are the challenges of deploying deep learning models in production?
- How would you modify a pre-trained model from classification to regression?

Like โค๏ธ for more post ๐Ÿฃ.
๐Ÿ‘8โค3
Top 10 important data science concepts

1. Data Cleaning: Data cleaning is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in a dataset. It is a crucial step in the data science pipeline as it ensures the quality and reliability of the data.

2. Exploratory Data Analysis (EDA): EDA is the process of analyzing and visualizing data to gain insights and understand the underlying patterns and relationships. It involves techniques such as summary statistics, data visualization, and correlation analysis.

3. Feature Engineering: Feature engineering is the process of creating new features or transforming existing features in a dataset to improve the performance of machine learning models. It involves techniques such as encoding categorical variables, scaling numerical variables, and creating interaction terms.

4. Machine Learning Algorithms: Machine learning algorithms are mathematical models that learn patterns and relationships from data to make predictions or decisions. Some important machine learning algorithms include linear regression, logistic regression, decision trees, random forests, support vector machines, and neural networks.

5. Model Evaluation and Validation: Model evaluation and validation involve assessing the performance of machine learning models on unseen data. It includes techniques such as cross-validation, confusion matrix, precision, recall, F1 score, and ROC curve analysis.

6. Feature Selection: Feature selection is the process of selecting the most relevant features from a dataset to improve model performance and reduce overfitting. It involves techniques such as correlation analysis, backward elimination, forward selection, and regularization methods.

7. Dimensionality Reduction: Dimensionality reduction techniques are used to reduce the number of features in a dataset while preserving the most important information. Principal Component Analysis (PCA) and t-SNE (t-Distributed Stochastic Neighbor Embedding) are common dimensionality reduction techniques.

8. Model Optimization: Model optimization involves fine-tuning the parameters and hyperparameters of machine learning models to achieve the best performance. Techniques such as grid search, random search, and Bayesian optimization are used for model optimization.

9. Data Visualization: Data visualization is the graphical representation of data to communicate insights and patterns effectively. It involves using charts, graphs, and plots to present data in a visually appealing and understandable manner.

10. Big Data Analytics: Big data analytics refers to the process of analyzing large and complex datasets that cannot be processed using traditional data processing techniques. It involves technologies such as Hadoop, Spark, and distributed computing to extract insights from massive amounts of data.


Like if you need similar content ๐Ÿ˜„๐Ÿ‘

Hope this helps you ๐Ÿ˜Š
โค4๐Ÿ‘3๐Ÿ˜Ž2
Data science interview questions ๐Ÿ‘‡

๐—ฆ๐—ค๐—Ÿ
- How do you write a query to fetch the top 5 highest salaries in each department?
- Whatโ€™s the difference between the HAVING and WHERE clauses in SQL?
- How do you handle NULL values in SQL, and how do they affect aggregate functions?

๐—ฃ๐˜†๐˜๐—ต๐—ผ๐—ป
- How do you handle large datasets in Python, and which libraries would you use for performance?
- What are context managers in Python, and how do they help with resource management?
- How do you manage and log errors in Python-based ETL pipelines?

๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด
- Explain the difference between bias and variance in a machine learning model. How do you balance them?
- What is cross-validation, and how does it improve the performance of machine learning models?
- How do you deal with class imbalance in classification tasks, and what techniques would you apply?

๐——๐—ฒ๐—ฒ๐—ฝ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด
- What is the vanishing gradient problem in deep learning, and how can it be mitigated?
- Explain how a convolutional neural network (CNN) works and when you would use it.
- What is dropout in neural networks, and how does it help prevent overfitting?

๐——๐—ฎ๐˜๐—ฎ ๐—ช๐—ฟ๐—ฎ๐—ป๐—ด๐—น๐—ถ๐—ป๐—ด
- How would you handle outliers in a dataset, and when is it appropriate to remove or keep them?
- Explain how to merge two datasets in Python, and how would you handle duplicate or missing entries in the merged data?
- What is data normalization, and when should you apply it to your dataset?

๐——๐—ฎ๐˜๐—ฎ ๐—ฉ๐—ถ๐˜€๐˜‚๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป - ๐—ง๐—ฎ๐—ฏ๐—น๐—ฒ๐—ฎ๐˜‚
- How do you create a dual-axis chart in Tableau, and when would you use it?
- How would you filter data in Tableau to create a dynamic dashboard that updates based on user input?
- What are calculated fields in Tableau, and how would you use them to create a custom metric?

#datascience #interview
Genpact is hiring!
Position: Business Analyst/ Data Analyst!
Qualification: Bachelorโ€™s/ Masterโ€™s Degree
Salary: 5.9 - 8.6 LPA (Expected)
Experienc๏ปฟe: Freshers/ Experienced
Location: Bangalore/ Hyderabad/ Gurugram

๐Ÿ“ŒApply Now: https://genpact.taleo.net/careersection/sgy_external_career_section/jobdetail.ftl?job=COR029438

All the best ๐Ÿ‘๐Ÿ‘
๐Ÿ‘1