OpenAI's New GPT 3.5 Embedding Model for Semantic Search
James Briggs
In this video, we'll learn how to use OpenAI's new embedding model text-embedding-ada-002.
We will learn how to use the OpenAI Embedding API to generate language embeddings and then index those embeddings in the Pinecone vector database for fast and scalable vector search.
This is a powerful and common combination for building semantic search, question-answering, threat detection, and other applications that rely on NLP and search over a large corpus of text data.
Everything will be implemented with OpenAI's new GPT 3.5 class embedding model called text-embedding-ada-002; their latest embedding model that is 10x cheaper than earlier embedding models, more performant, and capable of indexing ~10 pages into a single vector embedding.
š² Pinecone docs: https://docs.pinecone.io/docs/openai Colab notebook: https://github.com/pinecone-io/examples/blob/master/integrations/openai/semantic_search_openai.ipynb
šļø Support me on Patreon: https://patreon.com/JamesBriggs
š¾ Discord: https://discord.gg/c5QtDB9RAP
š¤ AI Dev Studio: https://aurelio.ai/
š Subscribe for Article and Video Updates! https://jamescalam.medium.com/subscribe https://medium.com/@jamescalam/membership
00:30 Semantic search with OpenAI GPT architecture 03:43 Getting started with OpenAI embeddings in Python 04:12 Initializing connection to OpenAI API 05:49 Creating OpenAI embeddings with ada 07:24 Initializing the Pinecone vector index 09:04 Getting dataset from Hugging Face to embed and index 10:03 Populating vector index with embeddings 12:01 Semantic search querying 15:09 Deleting the environment 15:23 Final notes ... https://www.youtube.com/watch?v=ocxq84ocYi0
120055123 Bytes