Questions to expect as a beginner in the industry
Q : What two parameters define a normal distribution?
A : Mean that defines the middle point. Standard Deviation that defines the width.
Q : What is One Hot Encoding?
A : The process of transforming categorical variables to numerical values in a way that each category becomes a column and whether or not a row belongs to that category is denoted by 1 or 0. Not very efficient when there are a lot of categories.
Q : What is a residual?
A : It’s the difference between the observed value and the predicted value of the target value.
Q : Explain the basic concept random forest?
A : An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the “average” one.
Q : What is Dimensionality Reduction?
A : It’s the process of reducing the number of variables under consideration by obtaining a set of principal components.
Q : What does “random” means in the Random Forest term?
A : The “random” part of the term refers to building each of the decision trees from a random selection of features by bootstrapping.
Q : What is unsupervised learning?
A : Unsupervised learning aims to detect patterns in data where no labels are given.
That’s it ! Stay tuned for more.
Questions taken from :
[Top 10 Junior Data Science Interview Questions & Answers
Here is a list with 10 of the most common data science interview questions for junior positions. Interviewers usually…santiviquez.medium.com](https://santiviquez.medium.com/top-10-junior-data-science-interview-questions-answers-52674a0cb20d "santiviquez.medium.com/top-10-junior-data-s..")