Always know what to expect from your data with great_expectations
Prodramp
Great Expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
In this beginner-level great_expectations tutorial, my objective is to help you learn more about great_expectations and find a way to incorporate great_expectations into your own data wrangling or exploratory data analysis as a data test or data validation, or data documentation tool.
Tutorial Level: Beginners or Starters
Content Timeline:
- (00:00) Video Start
- (00:07) Video Content Intro
- (02:05) Code & Jupyter Notebook Introduction
- (02:58) great_expectations - what, why, and how?
- (05:14) why you need great_expectations?
- (06:47) What actually is great_expectations?
- (10:26) great_expectations in simple terms
- (13:23) great_expectations as the data documentation tool
- (14:12) great_expectations method at a glance
- (16:05) great_expectations installation
- (17:11) great_expectations initialization
- (19:52) great_expectations context
- (25:27) great_expectations demo with the titanic dataset
- (32:31) Export and apply great_expectations config
- (35:09) Working with time-series dataset
- (40:43) Processing SparkDataFrame
- (46:51) great_expectations extension - debt_expectations
- (49:00) Recap
- (51:34) Credits
Library great_expectations GitHub: https://github.com/great-expectations/great_expectations
GitHub URL for the samples in the Video: https://github.com/prodramp/publiccode/tree/master/python/greatexpectation-work
Prodramp LLC https://prodramp.com | @prodramp https://www.linkedin.com/company/prodramp
Content Creator: Avkash Chauhan (@avkashchauhan) https://www.linkedin.com/in/avkashchauhan
Tags: #ai #aicloud #h2oai #driverlessai #machinelearning #cloud #mlops #model #collaboration #deeplearning #modelserving #modeldeployment #keras #tensorflow #pytorch #datarobot #datahub #aiplatform #aicloud #cometml #modelmonitoring #drift #modelregistry #modelmanagement #pandas #pandasprofiling #greatexpectations #great_expectations #datatesting #sparkdataframe #pyspark #assert ... https://www.youtube.com/watch?v=fFc2V7L_36s
243813764 Bytes