Web Scraping | Python Advertools
Dr Pi
Python code for the advertools "SEO crawler that can be customized, built with Scrapy" Using 'advertools' and Pandas I demonstrate how you can quickly crawl a site and then filter out the irrelevant results using 'str.contains'.
This excellent tool extracts "h1", "h3", "url", "body" and much more - the only addition is to add a selector to get "price".
= chapter timings =
0:00 Intro 2:48 Install advertools 3:19 Jupyter Notebook of code 6:40 running code 10:29 books.toscrape 14:29 monitoring Scrapy in the background 15:36 running from CLI as a ONE LINER
If you are new to web scraping then this gives you the power of Scrapy, without having to create the framework and create a "spider.py" yourself. If you have used Scrapy, then you have all of the custom settings of Scrapy still available to you.
See : https://github.com/eliasdabbas/advertools for the full readme notes.
Installation = pip3 install advertools
This is a video aimed at a data scientist, or someone new to web scraping but with existing skills in Jupyter and Pandas.
Visit redandgreen blog for more Tutorials
š http://redandgreen.co.uk/about/blog/
Subscribe to the YouTube Channel
š https://www.youtube.com/c/DrPiCode
Follow on Twitter - to get notified of new videos
š https://twitter.com/RngWeb
Buy Dr Pi a coffee (or Tea) ā https://www.buymeacoffee.com/DrPi
Proxies
If you need a good, easy to use proxy, I was recommended this one, and having used ScraperAPI for a while I can vouch for them. If you were going to sign up anyway, then maybe you would be kind enough to use the link and the coupon code below?
You can also do a full working trial first as well, (unlike some other companies). The trial doesn't ask for any payment details either so all good! š
š 10% off ScraperAPI : https://www.scraperapi.com?fpr=ken49 ā¼ļø Coupon Code: DRPI10 (You can also get started with 1000 free API calls. No credit card required.)
Thumbs up yeah? (cos Algos..)
#webscraping #advertools #python ... https://www.youtube.com/watch?v=lcMpGohSafU
163679567 Bytes