Image by Pete Linforth from Pixabay

Member-only story

How To Select The Right Variables From A Large Dataset?

A Guide To Feature Selection

Seyma Tas
3 min readDec 30, 2020

What Is Feature Selection?

Imagine that we have a dataset of 1000 columns and millions of rows, how do we decide which columns are more important than the others when we develop a prediction model? The answer is feature selection. By applying some simple and complicated methods, we find the necessary input variables and throw away the noise from the data.

Why do we select features?

You may think that more data results in a better machine learning model. It is true for the number of rows(instances) but not true for the number of columns(features). If we have redundant features in the dataset, we don’t get what we expect from the model.

Before starting to discover how to do feature selection, we should understand the motivation for feature selection.

1. Reduce the number of dimensions

High dimensional inputs can be problematic because they are difficult to sample from, they can introduce lots of challenges.

Dimensionality reduction

  • reduces storage space
  • reduces the computational cost and enables the machine learning…

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

No responses yet

Write a response