How To Schedule A Cron Job To Run Python (Scrapy) Scripts For Web Scraping
Dr Pi
Tutorial, demonstrating how to schedule scripts (cron jobs) to run automatically - here we see a Scrapy-python script being scheduled and run.
(This is most relevant to Linux/Mac OSX operating systems)
It covers editing your crontab file.
◾crontab -e
You can create one if it doesn't already exist.
This will allow you to run spiders on a repetitive schedule.eg.
◾ daily ◾ weekly ◾ monthly ◾ or specific times of day
To copy what I show in the video you will also need to make sure you invoke the Scrapy spider using CrawlerProcess rather than the typical CLI Scrapy syntax.
The time and date fields are:
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sunday, or use names)
A field may contain an asterisk (*), which always stands for
"first-last".
Names can also be used for the 'month' and 'day of week' fields. Use
the first three letters of the particular day or month (case does not
matter). Ranges or lists of names are not allowed
⚠ Note! If you want a file to run on the hour, you use 59, as the 60 mins range is 0-59! eg: 59 0 * * * /usr/bin/python3 myfile.py
see: https://www.man7.org/linux/man-pages/man5/crontab.5.html 🤓
RTFM 📙 : man crontab man cron
☕️☕️☕️ Buy Dr Pi a Coffee...or Tea! : https://www.buymeacoffee.com/DrPi ☕️☕️☕️
⚠ Disclaimer : Any code provided in this tutorial is for educational use only, I am not responsible for what you do with it. 🚓
Next I'll transfer some spiders and bs4 code to the Pi Zero and see if we can schedule on that using the same syntax used here.
See you around yeah? Dr Pi. ... https://www.youtube.com/watch?v=28IeIXLBDac
73019318 Bytes