4 the most useful discrete distributions in Data Science

Anastasia Karavdina
May 13, 2024
3 min read

You most probably have heard that probability distributions come into action in all aspects of life to determine the likelihood of events. However, for many people, they are quite intimidating. Unfortunately, many courses on probability overwhelm you with formulas and don't give you an intuition where these distributions can be met. Let's fix it now and go through the most common discrete distribution you will see.

1. Discrete uniform distributioN: All outcomes are equally likely

"Uniform distribution" is used to describe a type of statistical distribution where every outcome has the same likelihood of occurring. Take the example of tossing a six-sided die. Each of the six faces — numbered 1 through 6 — has an equal chance of landing face up, with the probability for each being 1/6. Thus a graph representing the uniform distribution features bars of identical height, each reflecting the equal probability of different outcomes. In the case of the six-sided die, each bar on the graph would have a height corresponding to a probability of 1/6, or approximately 0.166667.

2. Bernoulli: Single-trial with two possible outcomes

The Bernoulli distribution is one of the simplest probability distributions to grasp and serves as a foundational building block for more complex distributions. It applies to any scenario involving just one trial with only two possible outcomes. Examples include flipping a coin or selecting True or False on a quiz.

In these instances, the event comprises a single trial. For instance, consider flipping a coin once; this single trial yields two possible results: heads or tails.

Usually, when following a Bernoulli distribution, we have the probability of one of the outcomes (p). From (p), we can deduce the probability of the other outcome by subtracting it from the total probability (1), represented as (1-p). Here is an example of distribution for each of the outcomes with equal probability 0.5:

3. Binomial: A sequence of Bernoulli events

The Binomial Distribution is essentially the aggregate of several Bernoulli trials — events that have only two possible outcomes. It's particularly applicable to repeated trials with binary outcomes where the probability of success remains constant across trials. An example of this is flipping a coin multiple times and counting how many times it lands heads or tails.

Binomial vs Bernoulli distribution.

Consider you’re attempting a quiz that contains 10 True/False questions. Trying a single T/F question would be considered a Bernoulli trial, whereas attempting the entire quiz of 10 T/F questions would be categorized as a Binomial trial. The main characteristics of Binomial Distribution are:

Given multiple trials, each of them is independent of the other. That is, the outcome of one trial doesn’t affect another one.
Each trial can lead to just two possible results (e.g., winning or losing), with probabilities p and (1 – p).

For quiz with 10 questions, if probability to answer each question right is 0.5, here is how distribution of getting right answers would look like:

4. Poisson: Given number of events occurring in a fixed interval of time

The Poisson distribution is used to model the number of times an event happens in a specific interval of time or space. It's especially useful when you want to understand the likelihood of something happening several times over a fixed period or in a fixed area when these events are relatively rare and can be counted.

Here's how to think about it in simpler terms:

Imagine you're running a bookstore and want to know how often you might expect customers to enter the store within an hour. If, historically, around 10 customers tend to visit every hour, you could use the Poisson distribution to calculate the probability of seeing, for example, exactly 13 customers in the next hour.

In this way, the Poisson distribution helps predict number of visitors to a website, number of buses arriving at a bus stop per hour, demand for a particular product, etc

The main characteristics which describe the Poisson Processes are:

The events are independent of each other.
An event can occur any number of times (within the defined period).
Two events can’t take place simultaneously.

The Poisson distribution graph displays the number of times an event occurs within a specific time interval and the probability associated with each occurrence:

Why should you care?

Assumptions about probability distributions can be very helpful, in particular:

Select an appropriate model Different machine learning models assume that the underlying data follow specific statistical distributions
Detect anomalies in your data In tasks like fraud detection or data quality checks, knowing the expected distribution of data helps identify values that deviate significantly from this expectation, which can be potential indicators of anomalous behavior.
Improve model generalization Correct assumptions about data distributions help in designing models that not only fit the current data well but also generalize effectively to new, unseen data. This is because the assumptions guide the learning algorithm to focus on patterns that are statistically significant rather than noise.

4 the most useful discrete distributions in Data Science

1. Discrete uniform distributioN: All outcomes are equally likely

2. Bernoulli: Single-trial with two possible outcomes

3. Binomial: A sequence of Bernoulli events

4. Poisson: Given number of events occurring in a fixed interval of time

Why should you care?

Recent Posts

Comments

or fill up the form: