What is Ridge Regression? [2024]

Contributed by: Prashanth Ashok

Ridge regression is a model-tuning method that is used to analyze any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, resulting in predicted values being far away from the actual values.

The cost function for ridge regression:

Min(||Y – X(theta)||^2 + λ||theta||^2)

Lambda is the penalty term. λ given here is denoted by an alpha parameter in the ridge function. By changing the values of alpha, we are controlling the penalty term. The higher the values of alpha, the bigger is the penalty, and therefore the magnitude of coefficients is reduced.

It shrinks the parameters. Therefore, it is used to prevent multicollinearity

It reduces the model complexity by coefficient shrinkage

Check out the free course on regression analysis.

Ridge Regression Models

For any type of regression machine learning model, the usual regression equation forms the base which is written as:

Y = XB + e

Where Y is the dependent variable, X represents the independent variables, B is the regression coefficients to be estimated, and e represents the errors are residuals.

Once we add the lambda function to this equation, the variance that is not evaluated by the general model is considered. After the data is ready and identified to be part of L2 regularization, there are steps that one can undertake.

Standardization

In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations. This causes a challenge in notation since we must somehow indicate whether the variables in a particular formula are standardized or not. As far as standardization is concerned, all ridge regression calculations are based on standardized variables. When the final regression coefficients are displayed, they are adjusted back into their original scale. However, the ridge trace is on a standardized scale.

Also Read: Support Vector Regression in Machine Learning

Bias and variance trade-off

Bias and variance trade-off is generally complicated when it comes to building ridge regression models on an actual dataset. However, following the general trend which one needs to remember is:

The bias increases as λ increases.

The variance decreases as λ increases.

Assumptions of Ridge Regressions

The assumptions of ridge regression are the same as those of linear regression: linearity, constant variance, and independence. However, as ridge regression does not provide confidence limits, the distribution of errors to be normal need not be assumed.

Now, let’s take an example of a linear regression problem and see how ridge regression if implemented, helps us to reduce the error.

We shall consider a data set on Food restaurants trying to find the best combination of food items to improve their sales in a particular region.

Upload Required Libraries

import numpy as np   

import pandas as pd

import os

 

import seaborn as sns

from sklearn.linear_model import LinearRegression

import matplotlib.pyplot as plt   

import matplotlib.style

plt.style.use('classic')

 

import warnings

warnings.filterwarnings("ignore")



df = pd.read_excel("food.xlsx")

After conducting all the EDA on the data, and treatment of missing values, we shall now go ahead with creating dummy variables, as we cannot have categorical variables in the dataset.

df =pd.get_dummies(df, columns=cat,drop_first=True)

Where columns=cat is all the categorical variables in the data set.

After this, we need to standardize the data set for the Linear Regression method.

Scaling the variables as continuous variables has different weightage

#Scales the data. Essentially returns the z-scores of every attribute

 

from sklearn.preprocessing import StandardScaler

std_scale = StandardScaler()

std_scale



df['week'] = std_scale.fit_transform(df[['week']])

df['final_price'] = std_scale.fit_transform(df[['final_price']])

df['area_range'] = std_scale.fit_transform(df[['area_range']])

Train-Test Split

# Copy all the predictor variables into X dataframe

X = df.drop('orders', axis=1)

 

# Copy target into the y dataframe. Target variable is converted in to Log. 

y = np.log(df[['orders']])



# Split X and y into training and test set in 75:25 ratio

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25 , random_state=1)

Linear Regression Model

Also Read: What is Linear Regression?

# invoke the LinearRegression function and find the bestfit model on training data

 

regression_model = LinearRegression()

regression_model.fit(X_train, y_train)



# Let us explore the coefficients for each of the independent attributes

 

for idx, col_name in enumerate(X_train.columns):

    print("The coefficient for {} is {}".format(col_name, regression_model.coef_[0][idx]))



The coefficient for week is -0.0041068045722690814

The coefficient for final_price is -0.40354286519747384

The coefficient for area_range is 0.16906454326841025

The coefficient for website_homepage_mention_1.0 is 0.44689072858872664

The coefficient for food_category_Biryani is -0.10369818094671146

The coefficient for food_category_Desert is 0.5722054451619581

The coefficient for food_category_Extras is -0.22769824296095417

The coefficient for food_category_Other Snacks is -0.44682163212660775

The coefficient for food_category_Pasta is -0.7352610382529601

The coefficient for food_category_Pizza is 0.499963614474803

The coefficient for food_category_Rice Bowl is 1.640603292571774

The coefficient for food_category_Salad is 0.22723622749570868

The coefficient for food_category_Sandwich is 0.3733070983152591

The coefficient for food_category_Seafood is -0.07845778484039663

The coefficient for food_category_Soup is -1.0586633401722432

The coefficient for food_category_Starters is -0.3782239478810047

The coefficient for cuisine_Indian is -1.1335822602848094

The coefficient for cuisine_Italian is -0.03927567006223066

The coefficient for center_type_Gurgaon is -0.16528108967295807

The coefficient for center_type_Noida is 0.0501474731039986

The coefficient for home_delivery_1.0 is 1.026400462237632

The coefficient for night_service_1 is 0.0038398863634691582





#checking the magnitude of coefficients

from pandas import Series, DataFrame

predictors = X_train.columns

 

coef = Series(regression_model.coef_.flatten(), predictors).sort_values()

plt.figure(figsize=(10,8))

 

coef.plot(kind='bar', title="Model Coefficients")

plt.show()

Variables showing Positive effect on regression model are food_category_Rice Bowl, home_delivery_1.0, food_category_Desert, food_category_Pizza, website_homepage_mention_1.0, food_category_Sandwich, food_category_Salad, and area_range – these factors highly influencing our model.

Ridge Regression versus Lasso Regression: Understanding the Key Differences

In the world of linear regression models, Ridge and Lasso Regression stand out as two fundamental techniques, both designed to enhance the prediction accuracy and interpretability of the models, particularly in situations with complex and high-dimensional data. The core difference between the two lies in their approach to regularization, which is a method to prevent overfitting by adding a penalty to the loss function. Ridge Regression, also known as Tikhonov regularization, adds a penalty term that is proportional to the square of the magnitude of the coefficients. This method shrinks the coefficients towards zero but never exactly to zero, thereby reducing model complexity and multicollinearity. In contrast, Lasso Regression (Least Absolute Shrinkage and Selection Operator) includes a penalty term that is the absolute value of the magnitude of the coefficients.

This unique approach not only reduces coefficients but can also eliminate some of them entirely, effectively conducting feature selection and leading to simpler, more easily interpretable models.

The decision to utilize Ridge or Lasso Regression depends on the specific characteristics of the dataset and the problem at hand. Ridge Regression is preferred when all features are considered relevant or when dealing with multicollinearity, as it can handle correlated inputs by distributing coefficients among them. On the other hand, Lasso Regression is beneficial when simplicity is key, particularly in high-dimensional datasets where feature selection is crucial. However, Lasso may struggle with highly correlated features. Therefore, the choice between Ridge and Lasso should be based on the data, desired model complexity, and analysis goals, often determined through cross-validation and model performance evaluation.

Ridge Regression is essential in machine learning for creating robust models in scenarios prone to overfitting and multicollinearity. By introducing a penalty term based on the square of coefficients, Ridge Regression modifies standard linear regression to effectively address highly correlated independent variables. This method reduces overfitting, manages multicollinearity, and improves model generalization, enhancing performance on unseen data.

Implementing Ridge Regression involves selecting the appropriate regularization parameter, lambda, crucial for balancing bias and variance during model training. Widely supported in machine learning libraries like Python’s scikit-learn, Ridge Regression is valuable in finance and healthcare analytics for precise predictions and robust model construction. Its ability to improve accuracy and handle complex datasets solidifies its importance in machine learning.

Ridge Regression requires manual setting of the alpha hyperparameter, not automatically learned by the model. Optimal alpha values can be found through techniques like GridSearchCV. By setting the right alpha value, the model can be fine-tuned to include significant variables influencing the business problem.

If you found this information helpful and wish to learn more about similar concepts, consider joining Great Learning Academy’s free online courses. Can you please write this sentence in a different way?

What's Hot

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

A guide to chain of thought prompting

Faster R-CNN: A Beginner’s to Advanced Guide (2024)

Definition of Artificial General Intelligence (AGI)

Meta’s Next-Gen Model for Video and Image Segmentation

A Time-Saving Tool for OCR in Machine Vision

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

Benefits Of Text to Speech Across Industries

Unlocking the potential of AI on edge devices

AI-Driven Healthcare Revolution: MWC Conference Insights

UK & US Reach Landmark Agreement to Advance Responsible AI

About Us

Popular post

Building a responsible AI future

Who is Gurdeep Singh Pall? Qualtrics’ AI Strategy President

The Integration of Artificial Intelligence in Healthcare Systems

Jonathan Corbin, Founder & CEO of Maven AGI – Interview Series

Subscribe Newsletter

What's Hot

What is Ridge Regression? [2024]

Ridge Regression Models

Standardization

Bias and variance trade-off

Assumptions of Ridge Regressions

Upload Required Libraries

Scaling the variables as continuous variables has different weightage

Train-Test Split

Linear Regression Model

Ridge Regression versus Lasso Regression: Understanding the Key Differences

Keep Reading

About Us

Popular post

Subscribe Newsletter