How-to-Scrape-Decathlon-Using-Playwright-in-Python-for-Product-Data-Retrieval

Decathlon, a renowned sporting goods retailer, offers various sports apparel, footwear, and equipment. This article explores how to scrape apparel data by category using Playwright and Python to collect valuable insights into product trends and pricing from Decathlon's website.

Playwright is a library that helps control web browsers like Chromium, Firefox, and WebKit using programming languages such as Python and JavaScript. It's an excellent tool for data scraping from ecommerce website and automating tasks like form submissions and button clicks. Utilizing Playwright, we will navigate through each category and gather essential product information, including name, price, and description.

This tutorial provides a foundational understanding of using Playwright and Python for scraping Decathlon's website, focusing on extracting key data attributes from individual product pages.

List of Data Fields

List-of-Data-Fields
  • Product URL
  • Product Name
  • Brand
  • MRP
  • Sale Price
  • Number of Reviews
  • Ratings
  • Color
  • Features
  • Product Information

Below, you'll find a step-by-step guide to scrape Decathlon with Playwright in Python.

Import Necessary Libraries

To initiate the procedure, we should begin by importing the necessary libraries to enable us to interact with the website and retrieve the essential information.

Import-Necessary-Libraries

The following libraries serve specific purposes in automating browser testing with Playwright:

'random': This library is employed to generate random numbers, which can help create test data or shuffle the order of tests.

'asyncio': It is used for managing asynchronous programming in Python, particularly when utilizing Playwright's asynchronous API.

'pandas': This library is helpful for data analysis and manipulation. In this tutorial, it is applicable to store and manipulate data acquired from the web pages under examination.

'async_playwright': This library represents the asynchronous API for Playwright, and it plays a crucial role in automating browser testing. The asynchronous nature of this API enables the execution of multiple operations concurrently, resulting in faster and more efficient testing procedures.

These libraries collectively support the automation of browser testing with Playwright, covering tasks such as generating test data, managing asynchronous operations, data handling, and automating interactions with web browsers.

Scrape Product URLs

The next step involves extracting the URLs of the apparel products based on their respective categories.

Scrape-Product-URLs

In this context, our Decathlon product data scraping services use the function 'get_product_urls' to retrieve product URLs from a web page. This function harnesses the capabilities of the Playwright library for automating browser testing and gathering the resulting product URLs from the webpage. It accepts two parameters: 'browser' and 'page,' both instances of the Playwright Browser and Page classes, respectively.

The process begins by using 'page.querySelectorAll()' to locate all elements on the page containing the product URLs. Subsequently, a for loop helps to iterate through these elements, extracting the 'href' attribute, which contains the product page's URL.

Additionally, the function checks for a "next" button on the page. If such a button exists, the function clicks on it and invokes itself recursively to retrieve URLs from the subsequent page. This recursive process continues until all relevant product URLs are available.

Additionally-the-function-checks-for-a-next-button-on-the-page-If-such-a-button-exists-the

In this scenario, we aim to scrape product URLs categorized by product type. To achieve this, we follow a two-step process. Initially, we click the product category button to reveal the list of available categories. Subsequently, we click on each category to filter and gather the relevant product URLs.

In-this-scenario-we-aim-to-scrape-product-URLs-categorized-by-product-type-To-achieve-this

In this context, we utilize the Python function 'filter_products' to filter products on the Decathlon website by their respective categories and furnish a list of product URLs, along with their associated categories.

The process commences with the expansion of the product category section on the website, followed by the activation of the "Show All" button to reveal all available subcategories. Subsequently, a predefined list of subcategories is available, and the function iterates through each. For each subcategory, the corresponding checkbox applies the desired filtering criteria to the products. After selecting a subcategory, the function patiently waits for the page to load and employs the 'get_product_urls' function to extract the list of product URLs.

After processing all the subcategories, the function performs a cleanup operation by clicking the "Clear" button for each subcategory, effectively resetting the filters.

Scrape Product Name

The subsequent step involves extracting the names of the products from the web pages.

Scrape-Product-Name

In this context, we've employed an asynchronous function called 'get_product_name,' which accepts a 'page' argument representing a Playwright page object. Within this function, locate the product name element on the page using the 'query_selector()' method of the 'page' object, with the appropriate CSS selector provided. On locating the element, the function retrieves the text content of the element and returns it as a string.

However, in the event of an exception occurring during the process, such as when the element is not available on the page, the function assigns the 'product_name' variable the value "Not Available."

Scrape Product Brand

The subsequent step involves extracting the brand information of the products from the web pages.

Scrape-Product-Brand

Much like extracting the product name, the function 'get_brand_name' is responsible for retrieving the brand name of a product from a web page. The process involves an attempt to locate the brand name element using a CSS selector that targets the specific element containing the brand name. When the element is successfully available, the function extracts the text content using the 'text_content()' method and assigns it to the 'brand_name' variable. It's important that the brand name may include both the primary brand name and any sub-brand names, for instance, "Decathlon Wedze," where "Wedze" is one of the sub-brands of Decathlon. If an exception occurs during the search or extraction process for the brand name element, the function defaults to assigning "Not Available" to the brand name.

A similar approach can extract other attributes, such as MRP, sale price, number of reviews, ratings, color, features, and product information. For each attribute, a separate function is available, utilizing the 'query_selector' method and 'text_content' or equivalent methods to pinpoint the relevant element on the page and gather the desired information. Additionally, it's essential to adjust the CSS selectors used within these functions to align with the specific structure of the scraped web page.

Scrape MRP of Products

Scrape-MRP-of-Products

Scrape Sales Price

Scrape-Sales-Price

Scrape Number of Reviews

Scrape-Number-of-Reviews

Scrape Ratings

Scrape-Ratings

Scrape Features of Products

Scrape Product Information

Scrape-Product-Information

The code defines an asynchronous function named get_ProductInformation, which takes a page object as its argument. This function can retrieve product information from Decathlon's website. It iterates through each entry in the product information and extracts the text content from the "name" and "value" elements using the text_content method. Subsequently, it eliminates any newline characters from the collected strings with the replace method and stores the name-value pair in a dictionary named product information. If an exception is raised, for instance, if the element is unavailable or unable to extract, the code assigns the "Not Available" value to the ProductInformation dictionary.

Implementing Maximum Retry Limit for Request Retries

In web scraping, request retries are critical in handling temporary network errors or unexpected responses from websites. The primary goal is to reattempt a failed request, increasing the likelihood of a successful outcome.

Before accessing the target URL, the script incorporates a retry mechanism to address potential timeouts. It employs a while loop to repeatedly attempt the URL navigation until the request succeeds or the maximum number of retries has been exhausted. On achieving the maximum retry limit without success, the script raises an exception.

This code represents a function that executes a request to a specified link and handles retries in case of failure. This function proves valuable when scraping web pages, as network issues or timeouts can occasionally lead to request failures.

This-code-represents-a-function-that-executes-a-request-to-a-specified-link-and-handles

This function is responsible for requesting a specific URL using the 'goto' method provided by the Playwright library's page object. In the event of a request failure, the function attempts to retry it, allowing for a maximum of five retries as determined by the constant MAX_RETRIES. The function incorporates the 'asyncio.sleep' method ranging from 1 to 5 seconds between each retry to avoid immediate reattempts. This deliberate pause is essential to prevent overloading the request and potentially causing more failures.

The perform_request_with_retry function expects two arguments: 'page' and 'link.' The 'page' argument represents the Playwright page object responsible for the request, while the 'link' argument specifies the URL to the directed request. Continuing with the process, we invoke the functions and store the extracted data in an initially empty list.

The-perform-request-with-retry-function-expects-two-arguments-page-and-link-The

This Python script employs an asynchronous function called "main" to scrape product information from Decathlon web pages. It utilizes the Playwright library to initiate a Firefox browser, navigate to the Decathlon page, and extract the URLs of each product using the "extract_product_urls" function. Store these URLs in a list named "product_url." The script then iterates through each product URL, loads the product page using the "perform_request_with_retry" function, and retrieves various details such as the product name, brand, star rating, number of reviews, MRP, sale price, number of reviews, ratings, color, features, and product information.

Store the information as tuples in a list called "data." Additionally, the script displays a progress message after processing every 10 product URLs and a completion message once all the product URLs have been processed. The data in the "data" list is then converted into a Pandas DataFrame and saved as a CSV file using the "to_csv" method. Finally, the browser is closed using the "browser.close()" statement. Execute the script by calling the "main" function using the "asyncio.run(main())" statement, running it as an asynchronous coroutine.

Conclusion: In today's rapidly evolving business landscape, data is paramount, and web scraping is the gateway to unlocking its full potential. With the correct data and tools, brands can gain profound insights into the market, facilitating informed decisions that drive growth and profitability.

To remain competitive in the modern business world, brands must leverage every advantage available to stay ahead of competitors. It is where web scraping becomes crucial, enabling companies to access vital insights on market trends, pricing strategies, and competitor data.

By harnessing the capabilities of Playwright and Python tools, companies can extract valuable data from websites like Decathlon, obtaining a wealth of information about product offerings, pricing, and other critical metrics. When combined with the ecommerce website data collection services of a leading web scraping company, the results can be truly transformative and game-changing.

At Product Data Scrape, we uphold unwavering ethical standards in every facet of our operations, be it our Competitor Price Monitoring Services or Mobile App Data Scraping. With a worldwide footprint encompassing numerous offices, we steadfastly provide outstanding and transparent services to cater to the varied requirements of our esteemed clientele.

LATEST BLOG

Scrape Amazon vs Best Buy Laptop Prices - A Data-Driven Web Scraping Comparison

Scrape Amazon vs Best Buy laptop prices to compare deals, track price changes, and gain real-time insights for smarter purchasing and pricing strategies.

Using Web Search APIs For AI Applications, Agents, and Large Language Models in 2026

Learn how using web search APIs for AI applications enables real-time data access, better context retrieval, improved accuracy, and scalable intelligence for modern AI systems.

How Apparel & Accessories Data Collection from Indian E-Commerce Drives Smarter Fashion Decisions?

Apparel & Accessories Data Collection from Indian E-Commerce enables real-time tracking of prices, availability, and trends across online retail platforms in India.

Case Studies

Discover our scraping success through detailed case studies across various industries and applications.

WHY CHOOSE US?

Product Data Scrape for Retail Web Scraping

Choose Product Data Scrape to access accurate data, enhance decision-making, and boost your online sales strategy effectively.

Reliable Insights

Reliable Insights

With our Retail Data scraping services, you gain reliable insights that empower you to make informed decisions based on accurate product data and market trends.

Data Efficiency

Data Efficiency

We help you extract Retail Data product data efficiently, streamlining your processes to ensure timely access to crucial market information and operational speed.

Market Adaptation

Market Adaptation

By leveraging our Retail Data scraping, you can quickly adapt to market changes, giving you a competitive edge with real-time analysis and responsive strategies.

Price Optimization

Price Optimization

Our Retail Data price monitoring tools enable you to stay competitive by adjusting prices dynamically, attracting customers while maximizing your profits effectively.

Competitive Edge

Competitive Edge

THIS IS YOUR KEY BENEFIT.
With our competitive price tracking, you can analyze market positioning and adjust your strategies, responding effectively to competitor actions and pricing in real-time.

Feedback Analysis

Feedback Analysis

Utilizing our Retail Data review scraping, you gain valuable customer insights that help you improve product offerings and enhance overall customer satisfaction.

5-Step Proven Methodology

How We Scrape E-Commerce Data?

01
Identify Target Websites

Identify Target Websites

Begin by selecting the e-commerce websites you want to scrape, focusing on those that provide the most valuable data for your needs.

02
Select Data Points

Select Data Points

Determine the specific data points to extract, such as product names, prices, descriptions, and reviews, to ensure comprehensive insights.

03
Use Scraping Tools

Use Scraping Tools

Utilize web scraping tools or libraries to automate the data extraction process, ensuring efficiency and accuracy in gathering the desired information.

04
Data Cleaning

Data Cleaning

After extraction, clean the data to remove duplicates and irrelevant information, ensuring that the dataset is organized and useful for analysis.

05
Analyze Extracted Data

Analyze Extracted Data

Once cleaned, analyze the extracted e-commerce data to gain insights, identify trends, and make informed decisions that enhance your strategy.

Start Your Data Journey
99.9% Uptime
GDPR Compliant
Real-time API

See the results that matter

Read inspiring client journeys

Discover how our clients achieved success with us.

6X

Conversion Rate Growth

“I used Product Data Scrape to extract Walmart fashion product data, and the results were outstanding. Real-time insights into pricing, trends, and inventory helped me refine my strategy and achieve a 6X increase in conversions. It gave me the competitive edge I needed in the fashion category.”

7X

Sales Velocity Boost

“Through Kroger sales data extraction with Product Data Scrape, we unlocked actionable pricing and promotion insights, achieving a 7X Sales Velocity Boost while maximizing conversions and driving sustainable growth.”

"By using Product Data Scrape to scrape GoPuff prices data, we accelerated our pricing decisions by 4X, improving margins and customer satisfaction."

"Implementing liquor data scraping allowed us to track competitor offerings and optimize assortments. Within three quarters, we achieved a 3X improvement in sales!"

Resource Hub: Explore the Latest Insights and Trends

The Resource Center offers up-to-date case studies, insightful blogs, detailed research reports, and engaging infographics to help you explore valuable insights and data-driven trends effectively.

Get In Touch

Scrape Amazon vs Best Buy Laptop Prices - A Data-Driven Web Scraping Comparison

Scrape Amazon vs Best Buy laptop prices to compare deals, track price changes, and gain real-time insights for smarter purchasing and pricing strategies.

Using Web Search APIs For AI Applications, Agents, and Large Language Models in 2026

Learn how using web search APIs for AI applications enables real-time data access, better context retrieval, improved accuracy, and scalable intelligence for modern AI systems.

How Apparel & Accessories Data Collection from Indian E-Commerce Drives Smarter Fashion Decisions?

Apparel & Accessories Data Collection from Indian E-Commerce enables real-time tracking of prices, availability, and trends across online retail platforms in India.

How Dior Paris Product Data Scraping Unlocks Luxury Market Intelligence

Dior Paris product data scraping delivers real-time insights on pricing, collections, availability, and trends to support luxury retail intelligence.

D2C Founders Used E-Commerce Data APIs to Validate New Product Categories

E-Commerce Data APIs to Validate New Product Categories help brands analyze pricing, demand, competition, and trends faster, reducing risk and enabling confident product launch decisions.

Scaling Global Product Data Collection from AliExpress for Trend Analysis

Gain actionable ecommerce insights through product data collection from AliExpress to track pricing, SKUs, seller performance, demand trends, and sourcing opportunities.

Shelf Life Intelligence - Sephora vs Ulta Beauty product Shelf-life analysis

Analyze Sephora vs Ulta Beauty product Shelf-life analysis to track availability duration, product rotation, and optimize inventory and assortment strategies.

Data scraping for Uline.ca to get product data - Extract Product List, Unit Prices & Saller Data

Get structured pricing, SKUs, specs, and availability using data scraping for Uline.ca to get product data, enabling smarter procurement, catalog analysis, and B2B decisions.

Using Amazon and Namshi Product APIs for Advertising to Overcome Inventory and Targeting Challenges in Digital Marketing

Use Amazon and Namshi product APIs for advertising to optimise bids, track price changes, align ads with availability, and improve ROAS using real-time product intelligence.

Reducing Returns with Myntra AND AJIO Customer Review Datasets

Analyzed Myntra and AJIO customer review datasets to identify sizing issues, helping brands reduce garment return rates by 8% through data-driven insights.

Before vs After Web Scraping - How E-Commerce Brands Unlock Real Growth

Before vs After Web Scraping: See how e-commerce brands boost growth with real-time data, pricing insights, product tracking, and smarter digital decisions.

Scrape Data From Any Ecommerce Websites

Easily scrape data from any eCommerce website to track prices, monitor competitors, and analyze product trends in real time with Real Data API.

5 Industries Growing Fast Because of Web Scraping Technology

Discover how web scraping fuels growth in quick commerce, e-commerce, grocery, liquor, and fashion industries with real-time data insights and smarter decisions.

Why Meesho Sellers Are Growing Faster Than Amazon Sellers (Data Deep Dive)

This SMP explores why Meesho sellers are growing faster than Amazon sellers, using data-driven insights on pricing, reach, logistics, and seller economics.

How Real-Time Grocery Price APIs Power India & UAE Retail Intelligence (2025)

Real-time grocery price APIs help India and UAE retailers track prices, stock, and trends in 2025 to drive smarter pricing and retail intelligence decisions.

FAQs

E-Commerce Data Scraping FAQs

Our E-commerce data scraping FAQs provide clear answers to common questions, helping you understand the process and its benefits effectively.

E-commerce scraping services are automated solutions that gather product data from online retailers, providing businesses with valuable insights for decision-making and competitive analysis.

We use advanced web scraping tools to extract e-commerce product data, capturing essential information like prices, descriptions, and availability from multiple sources.

E-commerce data scraping involves collecting data from online platforms to analyze trends and gain insights, helping businesses improve strategies and optimize operations effectively.

E-commerce price monitoring tracks product prices across various platforms in real time, enabling businesses to adjust pricing strategies based on market conditions and competitor actions.

Let’s talk about your requirements

Let’s discuss your requirements in detail to ensure we meet your needs effectively and efficiently.

bg

Trusted by 1500+ Companies Across the Globe

decathlon
Mask-group
myntra
subway
Unilever
zomato

Send us a message