Introduction to the Normal (or Gaussian) Distribution
Zenva
ACCESS the FULL COURSE here: https://academy.zenva.com/product/data-science-mini-degree/?zva_src=youtube-datascience-md
TRANSCRIPT
In this video we are going to talk about the most important distribution in all of statistics and probably in many many other fields called the Normal Distribution. So like I said, it's the most important continuous distribution that you'll ever encounter. It's used in so many different fields: biology, sociology, finance, engineering, medicine, it's just, so ubiquitous throughout so many fields, I think a good understanding of it is very transferrable. Sometimes you also hear it called the Gaussian Distribution. They're just two words for the same distribution. And as it turns out, another reason why it's so important is that it turns out many real world random variables actually follow this Normal Distribution. So if you look at things like the IQs of people, heights of people, measurement errors by different kinds of instruments. These can all be modelled really nicely with a Normal Distribution. And we know a lot about the Normal Distribution both statistically and mathematically. So here's a picture of what it looks like at the bottom there. Sometimes you'll also hear it called a bell curve 'cause it kind of looks like the curve of a bell. And it's parametrized by two things, that's the mean, which is that lowercase u, in other words, the average or expected value. What that denotes is where the peak is, all normal distributions have a kind of peak, so the mean just tells you where that peak is. And then we have the standard deviation, which is the lowercase sigma. Sometimes you also see it written as sigma squared, which is called the variance. The standard deviation tells us the spread of the data away from the mean. Basically it just means how peak-y is it? Is it kind of flat, or is it very peak-y? That's really what the standard deviation is telling you. And here is the probability density function, it looks kind of like, kind of complicated there. But if you were to run that through some sort of graphing program, given a mean and a standard deviation, it would produce this graph. Fortunately there are libraries that compute this for us so we don't have to, we don't have to look into this too much. So, another notation point I should mention is that capital X, capital letters are usually random variable and lowercase letters are an actual, specific value. That's just a notation point. So let's talk a little bit about some of the properties of the Normal Distribution. So, mean, median, and mode are all equal, and they're all at the peak, we call it the peak. So the peak is the mean, as we've said, and the location. (mumbles) Another really nice property is that it's perfectly symmetric across the mean. And that' ... https://www.youtube.com/watch?v=UxltfTUl6iA
18419291 Bytes