How to Clean Data - Part 1
Zenva
ACCESS the FULL COURSE here: https://academy.zenva.com/product/data-science-mini-degree/?zva_src=youtube-datascience-md
TRANSCRIPT
Hello everybody! In this video we're gonna get started with a data analysis by first discussing a data cleaning, which is an integral part of any kind of data analysis. Basically, data cleaning is all about making sure that your data is ready to go for any other kind of analysis. So this might involve something like looking at columns of our data in a data frame and saying, "Well these column values aren't quit right." It might involve something like removing or renaming columns in a data frame, it might deal with how do we handle null values, or not a number, or missing data in a data frame? So we're gonna be discussing all of those topics, but the first thing that we have to do is download the source code and unzip it somewhere. So when you download the source code off the website and you unzip it, there'll be some CSV files. The main CSV file is flights.csv, and if you need a refresher on what kind of data is in there, you can look at something like the read me. If you open up the read me, you get some information as to what all the columns are and what they mean, as well as some potential values that they might, values and formats that they might take on, also units as well. And then, if you're unfamiliar with some flight terms, you can always look through this terms.csv, and then there's just some other information, such as some codes about the weekdays, some airport IDs, and airline IDs. All this information is critical, so I suggest you take a quick look through some of this data, especially the read me and the terms file, so that we're familiar with the columns.
Okay, so after we have done that, and after you download the environment file and load that up, then we can get started with data cleaning. We want to make sure that we have the right environment and let's launch Spyder, I already have an instance of Spyder running here. So I have a file opened up in Spyder, I'm just gonna save this guy to the same data analysis holder that has all of my CSV files. So I'm just gonna call this data_cleaning.py. Excellent, so the first thing that we'll need to do, is import pandas, so we'll do import pandas as pd, and the other we want to do is actually import numpy as well, because we'll be using it for our data cleaning and we have to have to load our flights data sets. So flight equals pd.read_csv, flights.csv, and then we have to make sure that's indexed correctly, so we'll set that to false, and we can always check just by saying something like print flights. So this is just some code that we'll load on our data and then print out some flight information, so let's run this guy. And after a couple seconds it'll load ... https://www.youtube.com/watch?v=XkmRMVY0KHo
46848136 Bytes