Image by Pete Linforth from Pixabay

Member-only story

How To Select The Right Variables From A Large Dataset?

A Guide To Feature Selection

Seyma Tas

--

What Is Feature Selection?

Imagine that we have a dataset of 1000 columns and millions of rows, how do we decide which columns are more important than the others when we develop a prediction model? The answer is feature selection. By applying some simple and complicated methods, we find the necessary input variables and throw away the noise from the data.

Why do we select features?

You may think that more data results in a better machine learning model. It is true for the number of rows(instances) but not true for the number of columns(features). If we have redundant features in the dataset, we don’t get what we expect from the model.

Before starting to discover how to do feature selection, we should understand the motivation for feature selection.

1. Reduce the number of dimensions

High dimensional inputs can be problematic because they are difficult to sample from, they can introduce lots of challenges.

Dimensionality reduction

  • reduces storage space
  • reduces the computational cost and enables the machine learning…

--

--

No responses yet