Web Scraping Amazon with Scrapy, Python, and an SQL Database | How To Use Scrapy Pipelines
Dr Pi
Web Scraping for "Web Scraping Books" on Amazon, Scrapy finds all the "paperback" books using 'process_item' within the pipelines.py file.
▶ Scrapy pipelines allow you to send results to an SQL database such a sqlite3 which is used here. The major benefit with 'process_item' is that you are filtering the results before storing them and with 'pipelines.py' you have all the code to connect to the database in just one file.
Chapter timings :
0:00 Intro 0:38 Starting the spider 3:20 Checking the results 4:41 Analysis of the code 5:00 Files to modify 5:20 Common mistakes 5:30 Pipelines 5:43 Sqlite3 6:53 Successful output 7:15 process_item method ( inside "pipelines.py" ) 7:42 Scrapy documentation example of pipeline / process item 8:00 DB Browser and conclusion / tips
▶ If you are used to using FEEDS to export to a CSV then this will be a learning curve, but this video will help you get up to speed and guide you through the extra / new aspects of Scrapy that you won't have used previously.
▶ Another feature of this particular code is that it relies on a class, and methods to handle the database connection and queries. If you are new to object oriented programming you may wish to copy my code - you can get the code from the redandgreen GitHub page.
▶ https://github.com/RGGH/Scrapy10
▶ By using sqlite3 I have tried to keep the database side of things as straightforward as possible, for now!
▶ If you are new to Web Scraping, or new to Scrapy, then the benefits of the approach used in this video is that you will be able to scrape much faster, and store more results than if you try Selenium or bs4 for example.
▶ If you like this video, or would like to see a version using MySQL then let me know in the comments, or maybe I'll do one anyway, that's all part of the thrill of YouTube eh?
▶ Further reading : https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#scrapy #sql #python #webscraping
Disclaimer : We create code to gather data for our own use, or for customers to do the same, provided it is used responsibly and sensibly.
We do not store or resell any data. We only offer the techniques to scrape publicly available data. Any code provided in our tutorials is for educational use only, we are not responsible for what you do with it.
See you around yeah? Dr P. ... https://www.youtube.com/watch?v=3vSkl6r0tec
57568505 Bytes