Imagine that you are organizing a data science conference. You are making a list of attendees. Later you want to look up a name in this attendee list. How much time does it take to find a name if you store the data as a list, and as a dictionary? If 100 people are attending your conference, you don’t have to think about lookup speed. You can keep your data in lists or dictionaries. You can even build an Excel table and use INDEX and MATCH keys to find the names you want.
What if you are storing billions of…
You are at the right place if you have these questions while learning Python:
Let’s see if I can answer these questions and some more in this article:
I am starting with the fundamentals.
We don’t declare the type of a variable when we assign a value to the variable in Python. It states the kind of variable in the runtime of the program. Other languages like C, C++, Java, etc…
In statistics, Kolmogorov-Smirnov(K-S) test is a non-parametric test of the equality of the continuous, one-dimensional (univariate) probability distributions.
K-S test compares the two cumulative distributions and returns the maximum difference between them.
One-sample K-S test or goodness of fit test was developed by Andrey Nikolayevich Kolmogorov in 1933. Its purpose is to compare the overall shapes of two sample distributions.
Two-sample K-S test was developed by Nikolai Smirnov in 1939. Its purpose is to compare one sample to a known statistical distribution.
We can separate the statistical tests into two: Parametric and non-parametric tests.
Parametric tests are suitable for normally…
In last week’s article, I wrote about train-test splits. However, there is a problem with separating the data into only two splits. Since we create random samples of data, the test and train performances can be very different depending on our train-test split. We must validate our model more than one time. We use K-Fold Cross Validation technique to deal with this issue.
We separate the dataset into k slices of equal size and train-test the model k times with k different partitions. 1 slice is the test set and k-1 slice is the train set for each training period.
This article is a detailed explanation for the interview question below.
Why is it important to create a separate evaluation split of a dataset when performing model/algorithm tuning in supervised learning?
We create predictive models to be able to guess the outcome for the unseen data. In order to measure how a model performs on new instances, we keep some part of the data “unseen” by the model.
We randomly separate the dataset into two parts: train data and test data. We use the train split for actual training and the test split to measure the model performance.
What is bias-variance tradeoff?
This question is frequently asked in machine learning interviews. Although it is an entry-level question, you can demonstrate your understanding of machine learning by explaining the answer beautifully. Because you can not write a code or draw bullseye diagrams during a virtual video call, you need to explain the fundamentals of the bias-variance tradeoff in simple sentences.
There are two sources of error that prevent a machine learning algorithm from generalizing: Bias and variance. The bias-variance tradeoff is the problem of minimizing two sources of error at the same time.
Reinforcement learning(RL) is a type of deep learning that has been receiving a lot of attention in the past few years. It is useful for the situations we want to train AI for certain skills we don’t fully understand.
RL has an agent that takes actions in an uncertain environment with the goal of maximizing the cumulative reward. The agent learns from its mistakes and its decision-making algorithm improves.
When I read about the RL concepts, I thought it was very similar to how animals learn and make decisions.
Below is a very famous video by Matthias Wandel. He is…
As you might know, creating a fully functional class in an object-oriented programming language is time-consuming because real classes perform a lot of complex tasks.
In Python, you can get the features you want from an existing class(parent) to create a new class(child). This Python feature is called inheritance.
By inheritance, you can
Since you are using a pre-used, tested class, you don’t have to put quite as much effort into your new class. …
In this blog, you are going to find answers to the below questions.
ANOVA is a statistical test that is used to evaluate the difference among the means of three or more groups.
ANOVA can only be applied to data that are normally distributed so we need to run a normality test. However, ANOVA test is robust to the assumption of normality. …
We are living in the data age. Each second tremendous amounts of data are created, stored, and used in the world. However, the world is rich in data but poor in knowledge. The reason is digging the data to find knowledge is as hard as digging rock to find gold.
When huge data was created from multiple resources, the data mining concept was born.
Data mining can be simply defined as obtaining valuable knowledge from data. This knowledge can be anomalies, patterns, correlations, and can be used to increase sales, decrease costs, improve customer loyalty, etc.
Transforming data into organized…