The purpose of this project is to illustrate the classification model of Iris flowers.
First of all, we upload our data.
from sklearn.datasets import load_iris dataSet = load_iris()
features = dataSet.data labels = dataSet.target
labelsNames = list(dataSet.target_names) featuresNames = dataSet.feature_namesprint([labelsNames[i] for i in labels[47:52]])
print(featuresNames)
We use Pandas to analyze the data we upload.
import pandas as pd
print(type(features))
featuresDF= pd.DataFrame(features) featuresDF.columns = featuresNames
print(type(featuresDF)) print(featuresDF.describe())
print(featuresDF.info())
Then we visualized the data we uploaded.
featuresDF.plot(x="sepal lenght" (cm), y= "sepal width (cm)", kind= "scatter")
or
featuresDF.plot(kind= "bar")
You can access more from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
First we chose our model from https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html, then we selected it as KNeighborsClassifier and ran it with parameters.
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=8)
After choosing our model, we divided our data with train_test_split.
import numpy as np
from sklearn.model_selection import train_test_split X, y = np.arange(10).reshape((5, 2)), range(5)X = features y = labels
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42)
Now we have training and testing data and the right labels.
clf.fit(X_train, y_train)
accuarcy = clf.score(X_train,y_train)
print("accuarcy on train data {:.2}%".format(accuarcy))
We have seen the success of this model we have trained.
We carry out the test
accuarcy = clf.score(X_test,y_test)
print("accuarcy on test data {:.2}%".format(accuarcy))
We stored the model using .joblib. we loaded the same function into a new variable with load while storing it with dump.
from joblib import dump, load filename = "myFirstSavedModel.joblib" dump(clf, filename)
clfUploaded = load(filename)
We're testing this model again with the same data set
accuarcy = clfUploaded.score(X_test,y_test)
print("accuarcy on test data {:.2}%".format(accuarcy))