How to Read Data in Pandas
Zenva
ACCESS the FULL COURSE here: https://academy.zenva.com/product/bite-sized-coding-academy/
TRANSCRIPT
In this video we are going to learn how we can read data from CSV and excel into pandas, into a panda's data frame. First thing only to do is import pandas. Alright so let's load in, let's read an Excel spreadsheet, and to give you an idea of the kind of the data that we're going to be loading I have the spreadsheet opened up in excel so it's just a list of different songs. So I'll say tracks equals and then I just call pd.read_excel and then I can just give it a file name, so I know this that the file is called Tracks.xlsx, so this will just load our data, so it's really this simple to load data into Panda. So let's print this guy out and see for ourselves, so you can see that I have some data here, and it's actually already telling me how many rows and columns I have. So because we can't see all of the columns here let's print out the columns just so we can verify that they're there, so I comment this out. I'm going to say tracks.columns and we can print out. We can print out all of the columns, so I can run this and see that we can see all of the columns that are being printed out, and additionally what I can do, and now that I have this information you know I can do something like, let's print out all of the entire column that's milliseconds and just do something like this, and then it'll print out all of the milliseconds and it's giving me some useful information, like the name of the column is, how many rows we have and then what the data type. What we can do is see how we can read a CSV, so I can load this guy up just by saying pd.read_csv and I have to give it the CSV file, so flights.csv and then we can do the same thing let's just print this out just so that we can have some idea of what's going on. But you can see that we have 600,000 rows and then 25 columns, so it's a pretty big data set. If we can see if I can expand this out a little bit, alright so it'll say year, month, but wait a minute this isn't quite right, cause this should be the year, so it seems all the columns are offset by one. So this isn't good, and this is because when we're loading something like this in Panda is what it's going to try to do is find, use this first column as the index, and we don't want it to do that. We want to just have natural, natural indexes, so just zero, one, two, three and so on, and so on. So what I can do is just use a parameter here and say index_col=False, and so now let me run this. Alright so now let's see what we have. So okay, this seems to be, this seems to be promising alright, so this is the correct column for the year. This is the correct column for the months, so month one being January, and now you got the indexes are correct so i ... https://www.youtube.com/watch?v=rHXTWFjA7p8
25664210 Bytes