How-to-Scrape-Dynamic-eCommerce-Website-with-Python-for-Pricing-Analysis

Dynamic websites generate real-time content through server-side scripts or database-driven API calls, making scraping more challenging than static sites. Sephora, a leading beauty retailer, offers a diverse product range, including its private label, accessible through its user-friendly online platform. This blog focuses on scraping data from Sephora, specifically lipsticks. Sephora's dynamic website tailors content to individual users' preferences, requiring user interactions for content generation.

Data scraping from ecommerce website proves invaluable for competitors and those seeking insights into Sephora's product catalog. Structured data simplifies product comparisons, trend tracking, and performance analysis. Selenium, a Python tool, facilitates dynamic website scraping by emulating user interactions, executing JavaScript, filling forms, and clicking buttons programmatically. Install the Selenium package and an appropriate web driver, like Chrome, Firefox, or Safari, to scrape dynamic eCommerce websites with Python.

Scraping Sephora Data with Python: A How-To Guide

We'll employ Python, Selenium, and BeautifulSoup to scrape data for lipstick from the Sephora website. The process details are available below:

Data Attributes:

Data-Attributes
  • Product URL: Direct link to a specific product.
  • Brand: The product's brand name.
  • Product Name: The unique name identifying the product.
  • Number of Reviews: The total count of product reviews.
  • Love Count: The number of times the product has been "loved" by users.
  • Star Rating: The average star rating given by customers.
  • Price: The cost of the perfume.
  • Shades: The lipstick shades.
  • Type: The classification of the lipstick type.
  • Formulation: Information about the lipstick composition.
  • Size: Size details.

Import Necessary Libraries

To commence our data extraction from the Sephora website and effectively process the desired information, let's initiate by importing the essential libraries. These libraries serve distinct roles and enable various functionalities in our scraping operation including e-commerce price monitoring. Our toolbox encompasses:

The 'time' library facilitates the introduction of pauses into our script to prevent overwhelming the website with an excessive number of requests.

The 'random' library generates random numbers, injecting diversity into our requests for a more natural interaction with the site.

The 'pandas' library is a versatile tool for storing and manipulating the data we collect.

Our e-commerce data scraping services will harness the 'BeautifulSoup' module from the 'bs4' library to parse HTML and extract valuable data.

The 'Selenium' library is a powerful tool enabling us to take control of a web browser and engage with the Sephora website programmatically.

The 'webdriver' module, a crucial component of the Selenium library, empowers us to specify the browser we intend to use for our scraping tasks.

Additionally, we can tap into various extensions of the 'webdriver' module, including 'Keys' and 'By,' to access additional functionalities within the Selenium framework.

If any of these libraries are absent from your system, you can swiftly install them using the 'pip' command. Here's the code to import these indispensable libraries:

If-any-of-these-libraries-are-absent-from-your-system

Next, we need to establish a Selenium browser instance. The following code accomplishes this task:

Next-we-need-to-establish-a-Selenium-browser-instance

Introducing All Essential Functions:

To enhance the readability and maintainability of our code, we'll encapsulate specific tasks into reusable functions. These custom functions, termed user-defined functions, allow us to isolate particular actions and employ them multiple times within our script without redundant coding. By defining functions, our code becomes more structured and more accessible to comprehend.

Function: Delay

To introduce pauses between specific processes, we'll create a function that suspends the execution of the subsequent code for a random period ranging from 3 to 10 seconds. Invoke this function whenever we require a delay within our script.

To-introduce-pauses-between-specific-processes

Function: Lazy Loading

To tackle lazy loading issues while scraping dynamic websites, we'll create a function that automatically scrolls down the page to trigger content loading. This function leverages the Keys class from the webdriver module to send page-down keystrokes to the browser whenever the body tag is detected. Additionally, we'll incorporate delays to ensure that the newly loaded content is correct.

Function-Lazy-Loading

Function: Pagination

To access the complete list of products on Sephora's lipstick homepage, which initially shows only 60 out of around 192 items, we need to click the "Show More Products" button repeatedly. Each click loads an additional 50 products. To determine the number of clicks required, we locate the element containing the total product count using Selenium's webdriver and its XPath. We then calculate the number of times to click the button by dividing this count by 50. The lazy_loading function ensures proper loading of all products during this process.

Function-Pagination

Function: Fetch Product Links

To extract product links from Sephora's lipstick page, we target items within the 'css-foh208' class, where links are within anchor tags with 'href' attributes.

Function-Fetch-Product-Links

Function: Get Non-Lazy Loading Product Links

Our e-commerce data scraper can create a function using Selenium to find elements with the 'css-foh208' class and extract the href attribute from the anchor tags within those elements. This function will gather the links, which is helpful for subsequent actions.

Function-Get-Lazy-Loading-Product-Links

Function: Get Lazy Loading Product Links

When dealing with products featuring lazy loading, the class name is distinct, marked as 'css-1qe8tjm'. However, the process remains identical to the previous one: we locate these elements, extract the href attributes from the anchor tags, and compile the links for further use.

Function-Get-Lazy-Loading-Product-Links

Function: Extract Page Content

This function retrieves the source code of the current web page using Selenium and parses it with BeautifulSoup, employing the 'html.parser' module for HTML code analysis.

Function-Extract-Page-Content

Function: Extract Brand Data

The function utilizes BeautifulSoup to locate the element with a specific attribute, like "data-at='brand_name'," and extracts the corresponding text, populating the 'brand' column. If the element isn't available, it fills the column with a default value, such as "Brand name not available."

Function-Extract-Brand-Data

Function: Extract Product Name

This function targets the span tag with the 'data-at' attribute set to 'product_name' and extracts the text, which is in the 'product_name' column. If no text is available, it assigns "Product name not available" as the default value.

Function-Extract-Product-Name

The "reviews_data" function counts the number of reviews for each product by looking for the "number_of_review" attribute in a span tag. If the attribute is available, it returns the number of reviews, and if not, it returns "Number of reviews not found."

The-reviews-data-function-counts-the-number

The "price_data" function retrieves the price of a product from a bold tag with the class name "ss-0." If the price is available, it returns the price; otherwise, it returns "Price data not available.

The-price-data-function-retrieves-the-price

The "lipstick_ingredients" function works like the previous one. It checks for the words "Lipstick Ingredients" and a colon. If it finds them, it gets the ingredients list. If not, it says "lipstick ingredients not available."

The-lipstick-ingredients-function-works-like

Retrieving Product URLs

To initiate the scraping process, we'll start with the homepage link using Selenium's webdriver. We'll load all available products using the pagination function, which includes handling lazy loading via the lazy_loading function. Then, we'll extract the source code of the current webpage, which now contains all available products, using BeautifulSoup with the html.parser module. We'll call the functions fetch_product_links and fetch_lazy_loading_product_links to gather links to all the products, storing them in a list named "product_links.

Retrieving-Product-URLs

Creating a DataFrame: Initialization

To efficiently manage the data we gather, we can build a dictionary with the necessary columns as keys and empty lists as values. Subsequently, we can transform this dictionary into a Pandas DataFrame named 'data.' Lastly, we can associate the 'product_links' list with the 'product_url' column within the DataFrame. This approach simplifies the storage and organization of the collected data.

Creating-a-DataFrame-Initialization

Identifying Essential Features for Extraction

Content is extracted from each 'product_url' link using the 'extract_content' function, followed by sequential calls to functions for required detail extraction."

Identifying-Essential-Features-for-Extraction

Storing Data in a CSV File

We must save the current data frame into a CSV file for future reference.

Storing-Data-in-a-CSV-File

Conclusion: Web scraping is a valuable tool for data collection, but it must be done ethically and within legal boundaries. This blog showcased how we automated the data collection of lipsticks from Sephora using Python, leveraging Selenium and BeautifulSoup. BeautifulSoup is faster, while Selenium is helpful for HTML rendering. The collected data can serve various purposes, like product and brand comparisons.

At Product Data Scrape, our commitment to unwavering ethical standards permeates every aspect of our business operations, whether our Competitor Price Monitoring Services or Mobile App Data Scraping. With a global presence spanning multiple locations, we unwaveringly deliver exceptional and transparent services to meet the diverse needs of our valued clients.

RECENT BLOG

What Are the Benefits of Using Web Scraping for Brand Price Comparison on Nykaa, Flipkart, and Myntra?

Web scraping for brand price comparison on Nykaa, Flipkart, and Myntra enhances insights, competitive analysis, and strategic pricing decisions.

How Can Web Scraping Third-Party Sellers on E-commerce Marketplaces Enhance Brand Protection?

Web scraping third-party sellers on e-commerce marketplaces enhances brand protection and helps detect counterfeit products efficiently.

What Strategies Can Be Developed Through Scraping Product Details Data from the Shein?

Scraping product details data from Shein provides insights into trends, customer preferences, pricing strategies, and competitive analysis for businesses.

Why Product Data Scrape?

Why Choose Product Data Scrape for Retail Data Web Scraping?

Choose Product Data Scrape for Retail Data scraping to access accurate data, enhance decision-making, and boost your online sales strategy.

Reliable-Insights

Reliable Insights

With our Retail data scraping services, you gain reliable insights that empower you to make informed decisions based on accurate product data.

Data-Efficiency

Data Efficiency

We help you extract Retail Data product data efficiently, streamlining your processes to ensure timely access to crucial market information.

Market-Adaptation

Market Adaptation

By leveraging our Retail data scraping, you can quickly adapt to market changes, giving you a competitive edge with real-time analysis.

Price-Optimization

Price Optimization

Our Retail Data price monitoring tools enable you to stay competitive by adjusting prices dynamically, attracting customers while maximizing your profits effectively.

Competitive-Edge

Competitive Edge

With our competitor price tracking, you can analyze market positioning and adjust your strategies, responding effectively to competitor actions and pricing.

Feedback-Analysis

Feedback Analysis

Utilizing our Retail Data review scraping, you gain valuable customer insights that help you improve product offerings and enhance overall customer satisfaction.

Awards

Recipient of Top Industry Awards

clutch

92% of employees believe this is an excellent workplace.

crunchbase
Awards

Top Web Scraping Company USA

datarade
Awards

Top Data Scraping Company USA

goodfirms
Awards

Best Enterprise-Grade Web Company

sourcefroge
Awards

Leading Data Extraction Company

truefirms
Awards

Top Big Data Consulting Company

trustpilot
Awards

Best Company with Great Price!

webguru
Awards

Best Web Scraping Company

Process

How We Scrape E-Commerce Data?

Insights

Explore our insights related blogs to uncover industry trends, best practices, and strategies

FAQs

E-Commerce Data Scraping FAQs

Our E-commerce data scraping FAQs provide clear answers to common questions, helping you understand the process and its benefits effectively.

E-commerce scraping services are automated solutions that gather product data from online retailers, providing businesses with valuable insights for decision-making and competitive analysis.

We use advanced web scraping tools to extract e-commerce product data, capturing essential information like prices, descriptions, and availability from multiple sources.

E-commerce data scraping involves collecting data from online platforms to analyze trends and gain insights, helping businesses improve strategies and optimize operations effectively.

E-commerce price monitoring tracks product prices across various platforms in real time, enabling businesses to adjust pricing strategies based on market conditions and competitor actions.

Let’s talk about your requirements

Let’s discuss your requirements in detail to ensure we meet your needs effectively and efficiently.

bg

Trusted by 1500+ Companies Across the Globe

decathlon
Mask-group
myntra
subway
Unilever
zomato

Send us a message