Linear Regression Model - Diamonds

Objectives

We will use diamonds dataset to train a regression model that will predict the price of a diamond

Here is a list of Tasks we’ll be doing:

Use Pandas to load data sets.
Identify the target and features.
Use Linear Regression to build a model to predict diamond prices.
Use metrics to evaluate the model.
Make predictions using a trained model.

Jupyter notebook is found at: Building_and_training_a_model_using_Linear_Regression_1.ipynb

Setup

We will be using the following libraries:

pandas for managing the data.
sklearn for machine learning and machine-learning-pipeline related functions.

Install Libraries

# All Libraries required for this lab are listed below. Already installed so this code block is commented out
pip install pandas==1.3.4
pip install scikit-learn==1.0.2
pip install numpy==1.21.6

Suppress Warnings

To suppress warnings generated by our code, we’ll use this code block

# To suppress warnings generated by the code
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

Import Libraries

import pandas as pd
from sklearn.linear_model import LinearRegression

Data - Task 1

Load

URL2 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0231EN-SkillsNetwork/datasets/diamonds.csv"

df2 = pd.read_csv(URL2)

	s	carat	cut	color	clarity	depth	table	price	x	y	z
11756	11757	1.03	Premium	H	VS2	62.4	58.0	5078	6.47	6.39	4.01
53799	53800	0.66	Ideal	F	VVS2	61.2	57.0	2732	5.63	5.58	3.43
15163	15164	1.24	Ideal	H	SI1	62.5	56.0	6095	6.92	6.88	4.31
46678	46679	0.59	Ideal	E	SI1	62.9	57.0	1789	5.36	5.33	3.36
10855	10856	1.13	Premium	J	VS1	61.1	59.0	4873	6.74	6.67	4.10

df2.shape

(53940, 11)

Define Targets/Features - Task 2

Target

In LR models we aim to predict the Target value given Input/Data.

So, in this example we are trying to find the price which is the Target Column in our table

Features

The feature(s) is/are the data columns we will provide our model with as input from which we want it to predict the Target Value/Column

In our example let’s provide the model with these Features, and see how accurate it will be in predicting the price

carat
depth

target = df2['price']
features = df2[['carat','depth']]

Build LR Model - Task 3

Define LR Model

lr2 = LinearRegression()

Train/Fit LR Model

Let’s train it. The response will be

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

lr2.fit(features,target)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluate Model & Predict - Task 4

Now that the model has been trained on the data/features provided above, let’s evaluate it

Score

The higher the better in LR Models

#Higher the score, better the model.
lr2.score(features,target)

0.8506754571636563

Predict

Let’s make some predictions:

What’s the Price for a diamond with carat=0.3 and depth=60
As you see: $244.96 is the cost of a diamond of carat=0.3 and depth=60

lr2.predict([[0.3,60]])

array([244.95605225])