Before getting started , lets answer some basic fundamentals
What is scraping ?
Web Scraping is extracting useful data from any website .
Why do we need to scrape data ?
We might need to scrape data for various reasons . A data scientist / ML Engineer might need data from various sources to train his/her AI model . A web developer might need to scrape images of a homestay from google maps to showcase in gallery section of the website he/she is building .
How to scrape the data ?
We can use various tools and frameworks to scrape data such as Selenium , Scrapy , BeautifulSoup etc.
Every website is written using HTML . We need to use XPATH and CSS selectors to extract data from particular html tag in the page .
Theory behind generalized scraping
Every website is built differently and serves its own purpose . It might use different libraries, different frameworks , different tools and technologies and have different HTML structure .
But if we can identify some similar patterns amoung them , we can use it to build a common code which can scrape any website we want .
Put on your thinking caps .....
What's common for all the ecommerce websites ??
Any guesses .......
- Landing Page
- Product Listing Page with list of all products
- Product Details Page
- Comments Section Page
- Pagination
These are the common functionalities of every ecommerce website
No wait .... But all these websites have their own HTML structure . How can we write code to extract data when structure is different and XPATH is different ......
Stay Tuned !! We will work on it in the part 2 of the series