Whenever individuals seek information on the latest hardware releases, Newegg.com consistently emerges as the go-to destination. Renowned for its expansive product selection, enticing deals, and top-notch service, this online retailer enjoys immense popularity among computer enthusiasts. Its meticulously structured website simplifies the process of obtaining and comparing hardware specifications. Consequently, we've scraped retail websites to gather all the requisite data for in-depth analysis.
The CPU and graphics card stand out as the two pivotal components in a desktop, wielding direct influence over its computing prowess. On platforms like YouTube, numerous channels are dedicated to daily benchmarking the gaming performance of each CPU and graphics card. Beyond gamers, data scientists also rely on robust computing capabilities. A proficient CPU can significantly expedite data processing, particularly when handling substantial datasets. Notably, the surge in GPU utilization for parallel computing is gaining momentum. Modern GPUs boast a multitude of cores, rendering them exceptionally well-suited for concurrent processing tasks. Consequently, they harness high-end graphics cards for resource-intensive computations like neural network model training. Scrape PC & component data from e-commerce websites to gain real-time insights into prices, specifications, and reviews, empowering informed decisions for your next PC build.
Amazon: Amazon is a global e-commerce and technology giant founded in 1994 by Jeff Bezos. It began as an online bookstore but has since evolved into a diverse platform selling various products, including electronics, apparel, and digital media. Amazon also offers cloud computing services through Amazon Web Services (AWS). Known for its customer-centric approach, Amazon has revolutionized online shopping with services like Amazon Prime, offering fast shipping and streaming content. It's one of the world's largest companies, influencing how people shop and consume digital content globally. Scrape Amazon PC & components data to understand competitive pricing.
Newegg: Newegg is a prominent online electronics and computer hardware retailer founded in 2001 by Fred Chang. Based in California, it has become a go-to destination for tech enthusiasts, offering various electronics, computer components, and accessories. Newegg is known for its competitive pricing, extensive product selection, and customer-focused approach. It provides detailed product information, user reviews, and ratings to assist shoppers in making informed decisions. With a solid online presence, Newegg has established itself as a critical player in the e-commerce industry, serving consumers and businesses in the tech sector. Scrape Newegg PC & components data for inventory management.
eBay: eBay, established in 1995 by Pierre Omidyar, is a renowned online marketplace and e-commerce platform. Operating globally, eBay enables individuals and businesses to buy and sell an extensive range of new and used products through auctions and fixed-price listings. It offers various categories, including electronics, fashion, collectibles, and more. eBay has revolutionized online commerce, connecting millions of buyers and sellers worldwide. Its user-friendly interface, payment security through PayPal, and robust feedback system have contributed to its enduring popularity in online shopping. Scrape eBay PC & components data to manage the listing.
In this blog, we will strongly emphasize data scraping from e-commerce website like Newegg.
Scrape the Website
As illustrated above, when you initiate a search for a specific desktop component, the result page exhibits a grid of 36 products. Notably, the URL encompasses details such as the page number, the quantity of products displayed per page, and the chosen product sorting criteria. Within this layout, the primary section features 36 product icons. You are promptly redirected to a dedicated product page when you click the "View Details" button. A wealth of parameters and specifications can be readily accessed here with the help of e-commerce data scraping service, primarily housed under the "Specification" tab.
With the website's structure in mind, the chosen web scraping PC & Components strategy is clear. Select Python's Scrapy package as the ideal tool for this task.
Generate a list of URLs encompassing pages from the first to the last search results page.
At each page, extract customer rating data, assign a ranking number to each product, filter out undesired items (such as Refurbished and Open box), and capture the URL for each product.
Navigate to each product page, retrieve hardware specifications under the "Specification" Tab, and yield the item information.
The organization of product information tends to become less structured in the later pages of the search results. The script utilizes multiple "try/except" statements to handle irregularities in such cases.
Getting the Data
Following the Newegg web scraping, the collected data underwent a thorough cleaning process facilitated by the R script. When analyzing CPU and components data using retail data scraper, the primary variables considered for inclusion in the analysis are:
Other variables, such as L3/L2 cache for CPUs and memory clock/interface for graphics cards, have been excluded from the data analysis.
Data Visualization
In this section, we create exploratory graphs to visualize the sales trends of CPUs and graphics cards. In the broader market landscape, AMD and Intel vie for dominance in the CPU sector, while AMD and Nvidia stand out as the two major GPU chip manufacturers.
Let's examine the customer rating distribution of CPUs and graphic cards available on Newegg. For CPUs, customer ratings consistently span from 4 to 5 points, suggesting a high level of reliability across the board. In contrast, graphic cards exhibit a more comprehensive range of ratings, including 1 and 3 points, indicating varying product quality. In terms of the overall market distribution, Intel products outnumber AMD CPUs. In the graphic card market, Nvidia leads while AMD lags in market share.
The barplot shown above shows that the graphic card market comprises ten different companies. Among these, companies like Powercolor and Sapphire predominantly align with AMD, while others, such as EVGA and PNY, exclusively utilize Nvidia GPUs. Interestingly, companies that employ both GPU types tend to offer a more extensive selection of products featuring Nvidia chips compared to AMD chips.
As depicted in the figure above, among the top 100 best-selling CPUs, Intel products exhibit a higher average price than their AMD counterparts. Nevertheless, Intel continues to maintain a leading position in popularity and sales.
The plot above showcases the relationship between product price and selling rank for the top 50 CPU products. The average price gradually increases, ranging from $200 to $500, as the rank progresses to approximately 25. However, as we approach rank 50, the average price drops below $200. It's worth mentioning that three specific Intel products influence this trend, ranked between 20 and 30, which are at over $1000. These high-priced items contribute to the elevated average price within this rank range.
As indicated by the boxplot presented above, a noticeable positive correlation exists between the price of CPUs and the number of cores they possess. Additionally, a distinct price differential between the two CPU manufacturers is evident, and this disparity becomes even more pronounced with CPUs featuring a higher number of cores.
Statistical Analysis
This section delves into the connection between CPU product prices and their associated parameters. First, let's examine a correlation plot encompassing all numeric variables within the CPU dataset.
It's worth noting the intriguing observation that CPU prices exhibit minimal correlation with operating frequency, implying that clock speed no longer serves as a direct indicator of computing power. Contemporary CPU manufacturers prioritize core count, thread count, and inner architecture to enhance performance rather than relying solely on clock speed increments.
Let's explore the feasibility of using linear regression to model CPU prices against other variables. As a preliminary step, we'll assess the normality of the price distribution.
Based on the Q-Q plot shown above, it's evident that the distribution of prices does not align closely with a normal distribution. Consequently, we opt to perform a Box-Cox transformation to address this non-normality.
We apply a natural logarithm transformation to the response value to enhance normality. Regarding variable selection, our model includes CPU brands, series, the number of cores, power consumption, and operating frequency. We then employ the Akaike Information Criterion (AIC) through the step function in R to select the most suitable linear model. Ultimately, the variables retained in the final model consist of series, power consumption, and the number of cores.
Conclusion
We successfully scraped comprehensive data on CPU and graphics card products from Newegg.com, employing the Python Scrapy module. Our findings revealed that CPU prices exhibit a minimal correlation with operating frequency. In examining CPU and graphics card data, we observed that AMD, while offering a diverse product range, faces dominance by Intel and Nvidia in the CPU and GPU markets, respectively.
Furthermore, we constructed a linear regression model, utilizing the natural logarithm of price and considering CPU series, core count, and power consumption as predictors. This model effectively accounts for approximately 87% of the variability in CPU prices.
In terms of future work, there is room for further analysis, including exploring additional predictors that may impact CPU and GPU prices. Additionally, evaluating the impact of market dynamics and external factors on product pricing could provide valuable insights for manufacturers and consumers.
At Product Data Scrape, we maintain the highest ethical standards in all operations, including Competitor Price Monitoring Services and Mobile App Data Scraping. With a global presence spanning multiple offices, we consistently deliver exceptional and honest services to meet the diverse needs of our valued customers.