Andrew Ng’s Machine Learning Simplified — Part 7 |Logistic Regression Model Parameters

In Part 7 of the series, we discuss about the parameter involved in a Logistic Regression Model.

Link to Part 6:

[Andrew Ng’s Machine Learning Simplified — Part 6 | Logistic Regression
In Part 6 of the series, we’ll learn how to perform the task of Classification using Logistic Regressionmedium.com](https://medium.com/@aakriti.sharma18/andrew-ngs-machine-learning-simplified-part-6-logistic-regression-2dd3c63d1ebd "medium.com/@aakriti.sharma18/andrew-ngs-mac..")

To evaluate any model, we use the Cost Function which is essentially just calculating the difference between the predicted and actual value. But here, since while predicting we use the exponential function to scale it between 0 and 1, our output will just be a wavy graph from 0 to 1 which will result in a lot of local minima deceiving the gradient descent algorithm.

To solve this problem, we reverse the effects of the exponential by taking it’s opposite, the log of the output before computing the cost.

The cost function is thus given as :

Now the cost is ready to be plotted into a convex graph thereby assuring the convergence for gradient descent which is found as :

“Conjugate gradient”, “BFGS”, and “L-BFGS” are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent.

Logistic Regression can be implemented in Python using sklearn module and can be imported as :

from sklearn.linear_model import LogisticRegression

That’s it for today. We have covered binary classification,

Now we’ll be discussing about multiclass classification.

Multiclass Classification

As the name suggests, there are multiple classes involved between which the classifier has to decide and label a given input. Instead of y = {0,1} for binary classifiers where only 2 classes were involved, we will expand our definition so that y = {0,1…n}.

One way to approach this type of problem is treating it as multiple binary classification problems, that is choosing one class and then lumping all the others into a single second class.

We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.

In the next part we will try to debug the overfitting problem and look at ways to avoid it while finishing Week 3. Stay Tuned!

Tweet thread form of the series :