Making The Most of Data: Augmented SBERT

James Briggs

description

🎁 Free NLP for Semantic Search Course: https://www.pinecone.io/learn/nlp

ML models are data-hungry. They consume massive amounts of data to identify generalized patterns and apply those learned patterns to new data.

As models get bigger, so do datasets. And although we have seen an explosion of data in the past decade, it is often not accessible or in an ML-friendly format, especially in niche domains.

For many niche, low-resource domains, finding or annotating a substantial dataset manually is practically impossible.

Fortunately, we don't need to label (or even find) this new data. Instead, we can automatically generate or label data using one or more data augmentation techniques.

In this video, we will introduce data augmentation and its application to the field of NLP. We will focus on the 'in-domain' flavor of a particular data-augmentation strategy named augmented SBERT (AugSBERT).

🌲 Pinecone article: https://www.pinecone.io/learn/data-augmentation/

🤖 70% Discount on the NLP With Transformers in Python course: https://bit.ly/3DFvvY5

🎉 Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership

👾 Discord: https://discord.gg/c5QtDB9RAP ... https://www.youtube.com/watch?v=3IPCEeh4xTg

created

2025-02-21

staked

0.0 LBC

license

Copyrighted (contact publisher)

File size

448880998 Bytes