How to Compute Probability in Pandas
Zenva
ACCESS the FULL COURSE here: https://academy.zenva.com/product/data-science-mini-degree/?zva_src=youtube-datascience-md
TRANSCRIPT
So, what we're gonna get started doing some probably computations on our dataset. So, the first thing you'll need to do is go download the source code and then you'll want to unzip it. Make sure that you unzip it, and then put it somewhere. And if you go inside, I have a ton of CSV files and the main dataset that we're gonna be working with, this is flights.csv, and it has a little over half a million US domestic flights from the year 2017. And that's a kind of, that's what we're gonna be working with. It has all kinds of information about the flights, the origin, city origin, state, destination city, destination state, the flight id, the airline, the distance of the flight, the arrival/departure, the expected arrival/departure, and the actual arrival/departure time, and the time in the air, and all other kinds of information. In fact, you can read all about the information in this ReadMe.csv. It shows you all of the different columns are as well as what they actually mean, and just for your curiosity, I have all of the information like airport codes, the airline codes, codes about week days, and as well as there's some terminology, there's a CSV sheet of terminology as well in case you're unfamiliar with flight terminology. I certainly am not too familiar with it so I like to read through this as well. So, please use all of these CSV files I have to your advantage so you get a better understanding of the dataset, so let's get started. And so we're gonna need to make sure we have the right environment and then launch an instance of Spider. And we actually have one running. And I'm just gonna save this as probability.pi inside of the same folder that houses the CSV files as well. All right, so now we can get started and we're gonna do some basic, we're gonna do some basic probability computations. So, I'm gonna import pandas first. We're gonna be needing that. Now I'll just load in the data using pandas. Flight equals pd.read_csv flights.csv, index_col equals False, and then immediately we're just gonna drop values that we're not going to need. We're gonna drop any no values or not a number. There are some but we just don't wanna deal with them in our probability computation. All right, I'm gonna write in comments the probabilities that we're gonna be computing so that we get a better idea of how we can actually compute them. All right, so let's start by computing the probability that with no other information, you just pick a random flight from the Air 2017, what is the probability that the flight started in California? I'll just use the full code. What is the probability that the flight started in California? Well if y ... https://www.youtube.com/watch?v=Nm_spDDef_w
22826786 Bytes