Thanks Meng for clarifying the question.

Let me try to give an intuitive and easy to understand answer for why L1 regularization has multiple solutions and L2 regularization has single solution.

Let’s first take L1 Regularization

L1 regularization = λ | θ | where λ | θ | ≤ C. C is a constant value

we can rewrite this as λ θ-C=0. This equation can have multiple solutions as we have different values for θ that would satisfy the equation.

This also helps with feature selection. Certain input features that are not contributing to the the target variable will have weight equal to zero or close to zero

L2 Regularization

L2 regularization = λ | θ |² where λ | θ |² ≤ C². C is a Constant value

we can rewrite this as λ θ ²-C²=0. This equation is a quadratic equation and can have only one solutions. As we have discriminant equal to zero for the quadratic equation we can have only one solution for θ.

L2 regularization is used when we have input features that are correlated like housing prices depends on the area of the house and no. of rooms. In such scenario θ can never be zero. Hence L2 has no feature selection and has a non sparse solution

Please let me know if the explanation helps

Loves learning, sharing, and discovering myself. Passionate about Machine Learning and Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store