PySpark! Reading CSV files into a DataFrame
Adrian Dolinay
Tutorial on reading a CSV file into a PySpark DataFrame. The tutorial reviews how to initiate a Spark Session, how to load in a CSV with & without a schema and compares the read times of PySpark to pandas.
The notebook can be found in the "PySpark" folder within the below repo: https://github.com/ad17171717/YouTube-Tutorials/tree/main/PySpark/PySpark!%20Reading%20CSV%20files%20into%20a%20DataFrame
CONNECT: LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/ GitHub: https://github.com/ad17171717 X: https://twitter.com/DolinayG Odysee: https://odysee.com/@adriandolinay:0 Medium: https://medium.com/@adriandolinay
|-Video Chapters-| 0:00 - Introduction 0:52 - Initiating a Spark Session 1:13 - Reviewing the data sets 2:13 - Reading in a single CSV into a PySpark DataFrame 4:04 - Reading in a single CSV with a Schema into a PySpark DataFrame 6:21 - Reading multiple CSVs into a single PySpark DataFrame 8:29 - Comparing PySpark and pandas Read Times 9:32 - Graphing PySpark and pandas Read Times 10:16 - When Should I Use PySpark? 10:58 - References and Additional Learning ... https://www.youtube.com/watch?v=N1bG3FwmqU0
61366802 Bytes