OpenAI's New GPT 3.5 Embedding Model for Semantic Search

James Briggs

description

In this video, we'll learn how to use OpenAI's new embedding model text-embedding-ada-002.

We will learn how to use the OpenAI Embedding API to generate language embeddings and then index those embeddings in the Pinecone vector database for fast and scalable vector search.

This is a powerful and common combination for building semantic search, question-answering, threat detection, and other applications that rely on NLP and search over a large corpus of text data.

Everything will be implemented with OpenAI's new GPT 3.5 class embedding model called text-embedding-ada-002; their latest embedding model that is 10x cheaper than earlier embedding models, more performant, and capable of indexing ~10 pages into a single vector embedding.

🌲 Pinecone docs: https://docs.pinecone.io/docs/openai Colab notebook: https://github.com/pinecone-io/examples/blob/master/integrations/openai/semantic_search_openai.ipynb

🎙️ Support me on Patreon: https://patreon.com/JamesBriggs

👾 Discord: https://discord.gg/c5QtDB9RAP

🤖 AI Dev Studio: https://aurelio.ai/

🎉 Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership

00:30 Semantic search with OpenAI GPT architecture 03:43 Getting started with OpenAI embeddings in Python 04:12 Initializing connection to OpenAI API 05:49 Creating OpenAI embeddings with ada 07:24 Initializing the Pinecone vector index 09:04 Getting dataset from Hugging Face to embed and index 10:03 Populating vector index with embeddings 12:01 Semantic search querying 15:09 Deleting the environment 15:23 Final notes ... https://www.youtube.com/watch?v=ocxq84ocYi0

created

2025-02-21

staked

0.0 LBC

license

Copyrighted (contact publisher)

File size

120055123 Bytes