Andrew Ng’s Machine Learning Simplified — Part 8 | Overfitting and Regularization

Andrew Ng’s Machine Learning Simplified — Part 8 | Overfitting and Regularization

In Part 8 of the series, we discuss the problem of overfitting and solve it using methods like Regularization.

Link to Part 7 :

[Andrew Ng’s Machine Learning Simplified — Part 7 |Logistic Regression Model Parameters
In Part 7 of the series, we discuss about the parameter involved in a Logistic Regression Model.medium.com](https://medium.com/@aakriti.sharma18/andrew-ngs-machine-learning-simplified-part-7-logistic-regression-model-parameters-6bdd2df46f5c "medium.com/@aakriti.sharma18/andrew-ngs-mac..")

The whole point of all these algorithms, math and code is so that our model learns from the data right? But what if it learns too well? Shouldn’t be a problem right? ah well it is *sighs*

The problem is if the model learns the data too well, it fails to capture the true relationship between input and output and thus gives poor validation accuracy ( results on unseen data ) although it exhibits good accuracy on the training data. This is called overfitting and is a very common problem in Machine Learning.

Overfitting is generally caused when the data is imbalanced or when there are too many features . It can be solved by -

i) Class balancing:

  • Class imbalance can be seen when there are too few samples of some classes and it misleads the model into predicting the major classes almost all the time.
  • For example : Credit Card Fraud Detection Dataset has very few fraudulent transactions as they rarely happen as opposed to the legit ones, suppose 99% are legit and 1% are fraudulent, if the model predicts every transaction as legit, it will still get 99% but it will fail to solve the purpose of detection.
  • This can be solved by several balancing techniques like oversampling, undersampling etc.

ii) Reduce the number of features :

  • Sometimes the model is too complex for data so it overfits.
  • This can be solved by selecting fewer more important features either manually or using selection algorithms.

iii) Regularization :

  • Keep all the features, but reduce the magnitude of parameters θ​.
  • Regularization works well when we have a lot of slightly useful features.

And it’s opposite Underfitting is a problem as well :)

Underfitting occurs when the model is too simple and fails to capture the relationship at all. It thus gives poor accuracy for both training and validation data. It can be solved by adding more parameters.

For now, Let’s discuss one of the solutions : Regularization.

To eliminate the influence of a lot a parameters without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function. This is done by penalizing the model by increasing the cost value as:

Each parameter whose effect is to be decreased is squared, multiplied with λ and added to the cost. The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Similarly, Gradient Descent for Linear Regression will be updated with the differential of this cost function:

Regularization can also be used for Logistic Regression by updating it’s cost function as :

And that’s a wrap to week 3! Next week we start with the super fun Neural Networks. Stay tuned!

Tweet thread form of the series :