Supervised Learning Classification Checkpoint

Noura n.

Order the writing of a tailor-made Computer science Research papers

Free quote online

Tutorials/exercises Format .docx

Supervised Learning Classification Checkpoint

Download

Read an extract

Themes

Supervised Learning Classification Checkpoint, KNN K-Nearest Neighbors Algorithm, Titanic data set, preprocess the data, Logistic Regression, confusion matrix, decision tree, random forest, manual prediction, new accuracy, optimal number of neighbors

Reader
Abstract
Contents
Extract

Abstract

In this checkpoint, we are going to work on the Titanic data set to predict if a passenger will survive or not using several classification algorithms of supervised learning. We will start by logistic regression, knn, then decision tree and we finalize by random forest.

Logistic Regression
KNN
Decision tree and random forest

Get this table of contents for free after login.

Extract

[...] Supervised Learning Classification Checkpoint Checkpoint Objective : In this checkpoint, we are going to work on the Titanic data set to predict if a passenger will survive or not using several classification algorithms of supervised learning. We will start by logistic regression, knn, then decision tree and we finalize by random forest. preprocess the data using pandas In import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix, roc_auc_score from sklearn.preprocessing import LabelEncoder dataset=pd.read_csv("titanic-passengers.csv", dataset.head() Out[4]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked No 2 Collander, Mr. [...]

[...] Nils (Alma Cornelia Berglund) female NaN S No 1 Davidson, Mr. Thornton male F.C B71 S def preprocess_data(new_data): new_data['Age'].fillna(new_data['Age'].mean(),inplace=True) new_data.replace({'Sex':{'male': 1,'female':0}},inplace=True) new_data['Cabin']=new_data.Cabin.fillna('G6') new_data.replace({'Survived':{'Yes': 1,'No':0}},inplace=True) In label_encoder = LabelEncoder() new_data['Embarked'] = label_encoder.fit_transform(new_data['Embarked']) return new_data data=preprocess_data(dataset) data Out[5]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Collander, Mr. Erik Gustaf G Moen, Mr. Sigurd Hansen F G Jensen, Mr. Hans Peder G Palsson, Mrs. Nils (Alma Cornelia Berglund) G Davidson, Mr. [...]

[...] Thornton F.C B Nasser, Mrs. Nicholas (Adele Achem) G Sirayanian, Mr. Orsen G Cacic, Miss. Marija G Petroff, Mr. Pastcho ("Pentcho") G Phillips, Miss. Kate Florence ("Mrs Kate Louis G rows x 12 columns X = data.drop(['Survived','Name','Ticket','PassengerId','Cabin'], axis=1) y = data['Survived'] X_train, X_test, y_train, y_test = train_test_split(X, test_size=0.2, random_state=42) X In Out[8]: Pclass Sex Age SibSp Parch Fare Embarked rows x 7 columns Part 1 Logistic Regression Apply logistic regression. [...]

[...] Use confusion matrix to validate your model. Another validation matrix for classification is ROC / AUC , do your research on them explain them and apply them in our case In logreg = LogisticRegression() logreg.fit(X_train, y_train) y_pred = logreg.predict(X_test) y_pred site-packages\sklearn\linear_model\_logistic.py:458: ConvergenceWarning: lb fgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Out[9]: array([ dtype=int64) In from sklearn.metrics import classification_report print(classification_report(y_test,y_pred)) precision recall f1-score support accuracy macro avg weighted avg In import seaborn as sns confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted']) sns.heatmap(confusion_matrix, annot=True) Out[11]: roc_auc = roc_auc_score(y_test, y_pred) print("ROC AUC Score:", roc_auc) In ROC AUC Score: 0.7682860998650473 Part 2 KNN Apply the KNN and predict your model Choose the optimal number of neighbors from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score knn = KNeighborsClassifier(n_neighbors=6) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) y_pred In Out[13]: array([ dtype=int64) In accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy) best_accuracy = 0 best_k = 0 for k in range(1, knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) accuracy = accuracy_score(y_test, y_pred) if accuracy > best_accuracy: best_accuracy = accuracy best_k = k print("Best Accuracy:", best_accuracy) print("Best best_k) Accuracy: 0.7486033519553073 In Best Accuracy: 0.7821229050279329 Best 13 Part Decision tree and random forest Apply decision tree and predict you model Plot your decision tree and try to read the tree branches and conclude a prediction manually. [...]

docx