Exploratory Data Analysis & Modeling with Python + R - (Part II - Mixed Effects Modeling with R)
Adrian Dolinay
Part II of a two part tutorial illustrating how to use Python and R in the same Jupyter notebook within Google Colab. This first video walks through how to conduct exploratory data analysis with Python, while the next video show how to model with R.
GitHub repo containing the notebook - https://github.com/tudev/Workshops-2020-2021
OBJECTIVE: Infer which explanatory variables significantly affect the size of trees within Duke Forest. Tree health can be used as a proxy for the overall health of the forest.
DATA DICTIONARY: ID - Unique tree identifier. yr - Year of the diameter recording. cm - Measurement of the diameter of a tree's base. Measurements are made at breast height marked by a nail that holds a tag indicating the identifying tree number. This is the response variable. annualprec - Total precipitation within the year. summerpdsi - Palmer Drought Severity Index for the summer. Uses readily available temperature and precipitation data to estimate relative dryness. wintertemp - Average winter (Jan. - Mar.) temperature.
CONNECT: LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/ GitHub: https://github.com/ad17171717 Twitter: https://twitter.com/DolinayG
------Video Chapters------
0:00 - Intro 0:20 - Condensing the number of response variable observations 2:07 - Calculating the mean diameter of a tree group's base by year 5:41 - MultiIndex to columns 6:40 - Annualizing the diameter growth rate 10:52 - Compare the OLS results of individual trees to group trees 14:55 - Write Python DataFrames to csv files 17:55 - Loading rpy2 extension into Google Colab 18:24 - Calling magic functions to run R code 19:10 - Installing and reading R packages 21:25 - Reading csv files to R Lists 23:47 - Isolating explanatory variables into a list 25:07 - Loop through and create all possible models with the given explanatory variables 29:15 - Background on Mixed Effects Model 34:06 - Model selection tool: Bayesian Information Criterion (BIC) 36:36 - Intraclass Correlation 39:19 - Model selection tool: Mean Squared Error 44:05 - Plotting actual vs predicted values 50:32 - Detailed analysis available in notebook + references ... https://www.youtube.com/watch?v=UaPh0MIhAgY
181624109 Bytes