Data Science & Machine Learning

DATA SCIENCE INTERVIEW QUESTIONS
[PART -15]

𝐐1. 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐮𝐧𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐝 𝐛𝐢𝐧𝐚𝐫𝐲 𝐜𝐥𝐚𝐬𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧?

𝐀ns. Techniques to Handle unbalanced Data:
1. Use the right evaluation metrics
2. Use K-fold Cross-Validation in the right way
3. Ensemble different resampled datasets
4. Resample with different ratios
5. Design your own models

𝐐2. 𝐀𝐜𝐭𝐢𝐯𝐚𝐭𝐢𝐨𝐧 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧?

𝐀ns. Activation functions are mathematical equations that determine the output of a neural network model. It is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output.

𝐐3. 𝐃𝐢𝐦𝐞𝐧𝐬𝐢𝐨𝐧 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧?

𝐀ns. Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.

𝐐4. 𝐖𝐡𝐲 𝐢𝐬 𝐦𝐞𝐚𝐧 𝐬𝐪𝐮𝐚𝐫𝐞 𝐞𝐫𝐫𝐨𝐫 𝐚 𝐛𝐚𝐝 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞?

𝐀ns. Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations.

𝐐5. 𝐑𝐞𝐦𝐨𝐯𝐞 𝐦𝐮𝐥𝐭𝐢𝐜𝐨𝐥𝐥𝐢𝐧𝐞𝐚𝐫𝐢𝐭𝐲?

𝐀ns. To remove multicollinearities, we can do two things.
1. We can create new features
2. remove them from our data.

𝐐6. 𝐥𝐨𝐧𝐠-𝐭𝐚𝐢𝐥𝐞𝐝 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 ?

𝐀ns. A long tail distribution of numbers is a kind of distribution having many occurrences far from the "head" or central part of the distribution. Most of occurrences in this kind of distributions occurs at early frequencies/values of x-axis.

𝐐7. 𝐎𝐮𝐭𝐥𝐢𝐞𝐫? 𝐃𝐞𝐚𝐥 𝐰𝐢𝐭𝐡 𝐢𝐭?

𝐀ns. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error.
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. If the outlier does not change the results but does affect assumptions, you may drop the outlier. Or just trim the data set, but replace outliers with the nearest “good” data, as opposed to truncating them completely.

𝐐8. 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐰𝐡𝐞𝐫𝐞 𝐭𝐡𝐞 𝐦𝐞𝐝𝐢𝐚𝐧 𝐢𝐬 𝐚 𝐛𝐞𝐭𝐭𝐞𝐫 𝐦𝐞𝐚𝐬𝐮𝐫𝐞 𝐭𝐡𝐚𝐧 𝐭𝐡𝐞 𝐦𝐞𝐚𝐧 ?

𝐀ns. If your data contains outliers, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values. In conclusion, if you are considering the mean, check your data for outliers, if any then better choose median.

ENJOY LEARNING 👍👍

🔥2👍1

2.54K views15:49