Exploratory Data Analysis & Modeling with Python + R - (Part I EDA with Python)
Adrian Dolinay
Part I of a two part tutorial illustrating how to use Python and R in the same Jupyter notebook within Google Colab. This first video walks through how to conduct exploratory data analysis with Python, while the next video show how to model with R.
GitHub repo containing the notebook - https://github.com/tudev/Workshops-2020-2021
OBJECTIVE: Infer which explanatory variables significantly affect the size of trees within Duke Forest. Tree health can be used as a proxy for the overall health of the forest.
DATA DICTIONARY: ID - Unique tree identifier. yr - Year of the diameter recording. cm - Measurement of the diameter of a tree's base. Measurements are made at breast height marked by a nail that holds a tag indicating the identifying tree number. This is the response variable. annualprec - Total precipitation within the year. summerpdsi - Palmer Drought Severity Index for the summer. Uses readily available temperature and precipitation data to estimate relative dryness. wintertemp - Average winter (Jan. - Mar.) temperature.
CONNECT: LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/ GitHub: https://github.com/ad17171717 Twitter: https://twitter.com/DolinayG
------Video Chapters------
0:00 - Intro 0:50 - Background on Data Set 2:19 - Loading Excel File onto Google Colab 2:47 - Reading Excel File into a pandas DataFrame 5:10 - Calculating Number of Trees Measured per Year 8:16 - Creating a Percentage Change Column 11:46 - Graphing a Correlation Heat Map with Seaborn 16:37 - Graphing a Pair Plot with Seaborn 19:05 - Graphing a Histogram with Matlplotlib 24:59 - Labeling Trees by Size 27:30 - Visualizing clustering with Plotly ... https://www.youtube.com/watch?v=ws6eWf2LeRg
99604565 Bytes