Why Do We Split Datasets?

Train-Test Split Explained…

3 min readMar 17, 2021

This article is a detailed explanation for the interview question below.

Why is it important to create a separate evaluation split of a dataset when performing model/algorithm tuning in supervised learning?

Predictive Models

We create predictive models to be able to guess the outcome for the unseen data. In order to measure how a model performs on new instances, we keep some part of the data “unseen” by the model.

What are train and test splits?

We randomly separate the dataset into two parts: train data and test data. We use the train split for actual training and the test split to measure the model performance.

Train Data

The data that is used to train the machine learning model is called train data. We give the train data to the model and we expect the machine learning model to make the predictions out of it. The model is built on the data it discovers in the training dataset.

Why Do We Split Datasets?

Train-Test Split Explained…

Predictive Models

What are train and test splits?

Train Data

Test Data

Written by Seyma Tas