Using Scrapy to Scrape Directory Websites | Generate 26 start urls
Dr Pi
Rather than crawl a site, I generate 26 start URLs using string.ascii_uppercase and a for loop, and f string {}.
This also shows how to slow down your spider in an attempt to avoid a response 429 from the target server.
I also demonstrate curl ifconfig.me which is a nice way to check that your host has updated it's ip address should you wish to try another source ip address.
There will be more to come on this, as the request was to send the output to a database. Anybody prefer sqlite3 or MySQL?
Dr P.
#scrapy ... https://www.youtube.com/watch?v=CrR_2vhaLfE
2020-10-08
0.0 LBC
Copyrighted (contact publisher)
64624761 Bytes