Web scraping with Scrapy | How to use ItemLoader and Input/Output Processors

Dr Pi

loaders processors scrapy scrapy web scraping

description

#scrapy #xpath #loader #processor Scraping a website using Scrapy with ItemLoader and Input/Output Processors to clean up the \t and \n characters from the response.xpath selectors.

I get the Description, Link, and Price of 185 drills from the Screwfix website as an example of how to use Scrapy to get the next page when there is pagination.

Micro editor was being used for the first time here and I show how easy it is to use. Can recommend it, much prefer it to Nano. Thanks to Maksim Korzh, CMK for the tip.

Doing some troubleshooting we also look at the mistake of leaving a "." from the xpath - you get groups of 20 results the same.

Official Scrapy Documentation : https://docs.scrapy.org/en/latest/topics/loaders.html#input-and-output-processors

Repo for the project = GitHub : https://github.com/RGGH/Scrapy

The actual spider is : https://github.com/RGGH/Scrapy/blob/master/sfix_spider.py

☕️☕️☕️ Buy Dr Pi a Coffee...or Tea! : https://www.buymeacoffee.com/DrPi ☕️☕️☕️ ... https://www.youtube.com/watch?v=ps9VFsgSj4k

created

2020-10-08

staked

0.0 LBC

license

Copyrighted (contact publisher)

File size

181289449 Bytes