Hugging Face Datasets #2 | Dataset Builder Scripts (for Beginners)
James Briggs
How to work with dataset builder scripts, intro to the download manager, and Apache Arrow datatypes used in Hugging Face (huggingface) Datasets - all in Python. Can be used for datasets in similarity search/semantic search/vector similarity search, classification, question-answering. Makes training/fine-tuning models with pytorch and tensorflow easy.
š¤ 70% Discount on the NLP With Transformers in Python course: https://bit.ly/3DFvvY5
š Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership
š¾ Discord: https://discord.gg/c5QtDB9RAP
00:00 Intro 00:49 Creating Compressed Files 02:41 Creating Dataset Build Script 04:49 Download Manager 08:59 Finishing Split Generator 10:13 Generate Examples Method 14:47 Add Dataset to Hugging Face 17:49 Apache Arrow Features 22:52 What's Next? ... https://www.youtube.com/watch?v=ODdKC30dT8c
196928034 Bytes