# All Libraries required for this lab are listed below. Already installed so this code block is commented out
==1.3.4
pip install pandas-learn==1.0.2 pip install scikit
Classifier for Flower Species
Objectives
We will use iris dataset to create a classifier that can classify the various species of flowers
Here is a list of Tasks we’ll be doing:
- Use Pandas to load data sets.
- Identify the target and features.
- Use Logistic Regression to build a classifier.
- Use metrics to evaluate the model.
- Make predictions using a trained model.
Setup
We will be using the following libraries:
pandas
for managing the data.sklearn
for machine learning and machine-learning-pipeline related functions.
Install Libraries
Suppress Warnings
To suppress warnings generated by our code, we’ll use this code block
# To suppress warnings generated by the code
def warn(*args, **kwargs):
pass
import warnings
= warn
warnings.warn 'ignore') warnings.filterwarnings(
Import Libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
Data - Task 1
- Modified version of iris dataset. Original dataset available at https://archive.ics.uci.edu/ml/datasets/Iris
Load
# the data set is available at the url below.
= "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0231EN-SkillsNetwork/datasets/iris.csv"
URL
# using the read_csv function in the pandas library, we load the data into a dataframe
= pd.read_csv(URL)
df # you can sample the data with df.sample(5)
SepalLengthCm |
SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|
66 | 5.6 | 3.0 | 4.5 | 1.5 | Iris-versicolor |
75 | 6.6 | 3.0 | 4.4 | 1.4 | Iris-versicolor |
73 | 6.1 | 2.8 | 4.7 | 1.2 | Iris-versicolor |
126 | 6.2 | 2.8 | 4.8 | 1.8 | Iris-virginica |
49 | 5.0 | 3.3 | 1.4 | 0.2 | Iris-setosa |
df.shape
(150, 5)
Plot Data
- You can see that there are 3 species and 50 flowers for each type of species.
df.Species.value_counts().plot.bar()
Define Targets/Features - Task 2
Target
In Classification models, the Target is the value our machine learning model needs to classify
So, in this example we are trying to classify the species
Features
The feature(s) is/are the data columns we will provide our model as input from which our model learns from
In our example let’s provide the model with these Features, and see how accurate it will be in predicting the species
- SepalLengthCm
- SepalWidethCm
- PetalLengthCm
- PetalWidthCm
= df["Species"]
target = df[["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"]] features
Build & Train Classifier - Task 3
Logistic Regression Model
Create a Logistic Regression model
= LogisticRegression() classifier
Train Logistic Regression Model
Response will be:
- LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class=‘warn’, n_jobs=None, penalty=‘l2’, random_state=None, solver=‘warn’, tol=0.0001, verbose=0, warm_start=False)
classifier.fit(features,target)
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
Evaluate Model - Task 4
Now that the model has been trained on the data/features provided above, let’s evaluate it
Score
The higher the better
#Higher the score, better the model.
classifier.score(features,target)
0.9733333333333334
Predict
Let’s make some predictions:
- Let’s predict the species of a flower with
- SepalLengthCm = 5.4
- SepalWidthCm = 2.6
- PetalLengthCm = 4.1
- PetalWidthCm = 1.3
5.4,2.6,4.1,1.3]]) classifier.predict([[
array(['Iris-versicolor'], dtype=object)