Kaggle Titanic Python Competiton Getting Started

This tutorial explains how to get started with your first competition on Kaggle. Titanic machine learning from disaster. We will be getting started with Titanic: Machine Learning from Disaster Competition. With this project, you’ll get familiar with Machine Learning Python Basics and also learn Kaggle platform functionalities. 

About the challenge – Titanic: ML from Disaster is a simple and basic machine learning model for predicting the survival of the Titanic incident. We will be creating an ML predictive model for  “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc) using titanic dataset.

Let’s get started with Machine Learning Competitions on Kaggle – A world for data scientists.

Note – Make sure you have to Sign up for Kaggle.com and signed in. For this competition, we will be using Python Programming Language.

Part 1: Get started

In this part, you’ll get familiar with the challenge on Kaggle and make your first pre-generated submission. 

Join Competition

Join the competition of Titanic Disaster by going to the competition page, and click on the “Join Competition” button and then accept the rules.

Kaggle-Titanic-Project-Getting-Started

The data

To take a look at the competition data, click on the Data tab where you will find the list of files.

Kaggle Titanic Project Getting Started Titanic Dataset

There are three files in the data: (1) train.csv, (2) test.csv, and (3) gender_submission.csv

[ 1 ] train.csv

train.csv file contains the subset of passenger details with a survived column. In the survived column “1” means the person survived and “0” means the person died. As the name suggests it is used for training machine learning models.

[ 2 ] test.csv

test.csv file contains the list of passenger details without a survived column which is never seen by the machine learning model. As the name suggests it is used for testing the machine learning model.

[ 3 ] gender_submission.csv

gender_submission.csv is a sample file that tells how your submission file structure should be. It predicts that all female passengers survived, and all male passengers died.

Let’s Submit Your First Submission

We already have a sample submission file gender_submission.csv. Let’s download it and Upload the submission file and then, click on the blue “Make Submission” button.

Kaggle Titanic Project Getting Started  Download Dataset

Kaggle will score your submission and you will see your score on leadership board.

Part 2: Setup your coding environment

In this part, you’ll create a notebook for training your machine learning model.

Notebook

Let’s create a Notebook by clicking on the Notebooks tab then click on New Notebook. It will automatically create a notebook for you. You will default code in your notebook.

Kaggle Titanic Project Getting Started Notebook

Load Data

Let’s load the titanic data in a notebook. We will be loading test and train data. In the new code cell type the below code.

train_data = pd.read_csv("/kaggle/input/titanic/train.csv")
train_data.head()
test_data = pd.read_csv("/kaggle/input/titanic/test.csv")
test_data.head()  

The above code will load and display the first 5 rows of train.csv and test.csv. 

Part 3: Writing Machine Learning Model

In this part, you will write your first Machine Learning. 

Machine Learning Model

We will be building our model using Random Forest Model. Random Forest contains many decision trees and most votes win. 

Kaggle Titanic Project Getting Started Titanic Decission Tree - Random Forest Classifier

Copy this code into your notebook, and run it in a new code cell. In the below code, we are importing RandomForestClassifier from sklearn.ensemble. Then loading survived data in y variable and setting the features. Using RandomForestClassifie we are creating model and then predicting model. After the prediction is created we are outputting it in my_submission.csv.

from sklearn.ensemble import RandomForestClassifier

y = train_data["Survived"]

features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])

model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X, y)
predictions = model.predict(X_test)

output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!") 

It will give output – Your submission was successfully saved!

Once everything is working fine, click on the “Commit” button in the top right corner of your notebook.

Inside this new window, click on the Output tab. Then, click on the “Submit to Competition” button to submit your results.

Kaggle Titanic Project Getting Started Submission

Once your file is successfully submitted, you should receive a message saying that you’ve moved up the leaderboard. Great work!

Hurray, Done, That its, Finished with your Titanic Competition. You can also train yours with other models like Support Vector Machine(SVM), Regression, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *