How to build a scraper to scrape any ecommerce website - Part 1

How to build a scraper to scrape any ecommerce website - Part 1

Before getting started , lets answer some basic fundamentals

What is scraping ?

Web Scraping is extracting useful data from any website .

Why do we need to scrape data ?

We might need to scrape data for various reasons . A data scientist / ML Engineer might need data from various sources to train his/her AI model . A web developer might need to scrape images of a homestay from google maps to showcase in gallery section of the website he/she is building .

How to scrape the data ?

We can use various tools and frameworks to scrape data such as Selenium , Scrapy , BeautifulSoup etc.

Every website is written using HTML . We need to use XPATH and CSS selectors to extract data from particular html tag in the page .

Theory behind generalized scraping

Every website is built differently and serves its own purpose . It might use different libraries, different frameworks , different tools and technologies and have different HTML structure .

image.png

image.png

But if we can identify some similar patterns amoung them , we can use it to build a common code which can scrape any website we want .

Put on your thinking caps .....

What's common for all the ecommerce websites ??

Any guesses .......

  1. Landing Page
  2. Product Listing Page with list of all products
  3. Product Details Page
  4. Comments Section Page
  5. Pagination

These are the common functionalities of every ecommerce website

No wait .... But all these websites have their own HTML structure . How can we write code to extract data when structure is different and XPATH is different ......

Stay Tuned !! We will work on it in the part 2 of the series