Member-only story
How To Select The Right Variables From A Large Dataset?
What Is Feature Selection?
Imagine that we have a dataset of 1000 columns and millions of rows, how do we decide which columns are more important than the others when we develop a prediction model? The answer is feature selection. By applying some simple and complicated methods, we find the necessary input variables and throw away the noise from the data.
Why do we select features?
You may think that more data results in a better machine learning model. It is true for the number of rows(instances) but not true for the number of columns(features). If we have redundant features in the dataset, we don’t get what we expect from the model.
Before starting to discover how to do feature selection, we should understand the motivation for feature selection.
1. Reduce the number of dimensions
High dimensional inputs can be problematic because they are difficult to sample from, they can introduce lots of challenges.
Dimensionality reduction
- reduces storage space
- reduces the computational cost and enables the machine learning…