Sign in

Data Scientist, Data Educator, Blogger
Photo by Marten Newhall on Unsplash

Comparison of dictionaries and lists

Imagine that you are organizing a data science conference. You are making a list of attendees. Later you want to look up a name in this attendee list. How much time does it take to find a name if you store the data as a list, and as a dictionary? If 100 people are attending your conference, you don’t have to think about lookup speed. You can keep your data in lists or dictionaries. You can even build an Excel table and use INDEX and MATCH keys to find the names you want.

What if you are storing billions of…

Photo by Bernard Hermant on Unsplash

Reference Counting and Generational Garbage Collection

You are at the right place if you have these questions while learning Python:

  • How is memory managed in Python?
  • What is garbage collection?
  • Which algorithms are used for memory management?
  • What is a cyclical reference?
  • How are Python objects stored in memory?

Let’s see if I can answer these questions and some more in this article:

I am starting with the fundamentals.

Python Is a Dynamically Typed Language.

We don’t declare the type of a variable when we assign a value to the variable in Python. It states the kind of variable in the runtime of the program. Other languages like C, C++, Java, etc…

Image by Christo Anestev from Pixabay

Comparing Distributions

Kolmogorov-Smirnov(K-S) test

In statistics, Kolmogorov-Smirnov(K-S) test is a non-parametric test of the equality of the continuous, one-dimensional (univariate) probability distributions.

K-S test compares the two cumulative distributions and returns the maximum difference between them.

One-sample K-S test or goodness of fit test was developed by Andrey Nikolayevich Kolmogorov in 1933. Its purpose is to compare the overall shapes of two sample distributions.

Two-sample K-S test was developed by Nikolai Smirnov in 1939. Its purpose is to compare one sample to a known statistical distribution.

Parametric and non-parametric statistics

We can separate the statistical tests into two: Parametric and non-parametric tests.

Parametric tests are suitable for normally…

Boise, Idaho, Photo by Alden Skeie on Unsplash

K-Fold Cross Validation Explained…

In last week’s article, I wrote about train-test splits. However, there is a problem with separating the data into only two splits. Since we create random samples of data, the test and train performances can be very different depending on our train-test split. We must validate our model more than one time. We use K-Fold Cross Validation technique to deal with this issue.

K-Fold Cross Validation

We separate the dataset into k slices of equal size and train-test the model k times with k different partitions. 1 slice is the test set and k-1 slice is the train set for each training period.

K-Fold Cross Validation in Scikit-Learn

Image by Manfred Richter from Pixabay

Train-Test Split Explained…

This article is a detailed explanation for the interview question below.

Why is it important to create a separate evaluation split of a dataset when performing model/algorithm tuning in supervised learning?

Predictive Models

We create predictive models to be able to guess the outcome for the unseen data. In order to measure how a model performs on new instances, we keep some part of the data “unseen” by the model.

What are train and test splits?

We randomly separate the dataset into two parts: train data and test data. We use the train split for actual training and the test split to measure the model performance.

Image by Myriams-Fotos from Pixabay

Bias-Variance Dilemma

What is bias-variance tradeoff?

This question is frequently asked in machine learning interviews. Although it is an entry-level question, you can demonstrate your understanding of machine learning by explaining the answer beautifully. Because you can not write a code or draw bullseye diagrams during a virtual video call, you need to explain the fundamentals of the bias-variance tradeoff in simple sentences.

Error In Machine Learning Models

There are two sources of error that prevent a machine learning algorithm from generalizing: Bias and variance. The bias-variance tradeoff is the problem of minimizing two sources of error at the same time.

Image by Alexandra ❤️A life without animals is not worth living❤️ from Pixabay

How Similar Animals And Computers Learn?

Reinforcement Learning

Reinforcement learning(RL) is a type of deep learning that has been receiving a lot of attention in the past few years. It is useful for the situations we want to train AI for certain skills we don’t fully understand.

RL has an agent that takes actions in an uncertain environment with the goal of maximizing the cumulative reward. The agent learns from its mistakes and its decision-making algorithm improves.

When I read about the RL concepts, I thought it was very similar to how animals learn and make decisions.

Mouse Maze Experiment:

Below is a very famous video by Matthias Wandel. He is…

Photo by vivek kumar on Unsplash

Inheritance: Extending Classes To Make New Classes

As you might know, creating a fully functional class in an object-oriented programming language is time-consuming because real classes perform a lot of complex tasks.

In Python, you can get the features you want from an existing class(parent) to create a new class(child). This Python feature is called inheritance.

By inheritance, you can

  • obtain the features of a parent class,
  • change the features that you don’t need,
  • add new features to your child class. (derived class or subclass)

Since you are using a pre-used, tested class, you don’t have to put quite as much effort into your new class. …

Image by Free-Photos from Pixabay

Analysis Of Variance(ANOVA)

In this blog, you are going to find answers to the below questions.

  • What is ANOVA?
  • What are the assumptions of ANOVA?
  • In which situations is the ANOVA used?
  • What is the difference between Student’s T-test and ANOVA?

What Is ANOVA?

ANOVA is a statistical test that is used to evaluate the difference among the means of three or more groups.

Assumptions of ANOVA

  • Populations are normally distributed.

ANOVA can only be applied to data that are normally distributed so we need to run a normality test. However, ANOVA test is robust to the assumption of normality. …

Image by Cherie Vilneff from Pixabay

Issues In Knowledge Mining From Data

We are living in the data age. Each second tremendous amounts of data are created, stored, and used in the world. However, the world is rich in data but poor in knowledge. The reason is digging the data to find knowledge is as hard as digging rock to find gold.

When huge data was created from multiple resources, the data mining concept was born.

What Is Data Mining?

Data mining can be simply defined as obtaining valuable knowledge from data. This knowledge can be anomalies, patterns, correlations, and can be used to increase sales, decrease costs, improve customer loyalty, etc.

Major Issues In Data Mining

Transforming data into organized…

Seyma Tas

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store