GitHub - AArashev/AmazonAnalytics: This project analyzes Amazon product data to understand pricing strategies, customer satisfaction, and market segmentation. Using data visualization, regression analysis, and clustering techniques, it uncovers relationships between product features, ratings, and prices to provide insights into e-commerce trends.

Amazon Product Analytics How to Run the Analysis

Clone the repository.
Open the Amazon .Rmd file in RStudio.
Knit the .Rmd file to produce the full analysis report, including visualizations and model outputs, or run the code chunks sequentially to reproduce the analysis step-by-step.

Project Overview

This project is a comprehensive analysis of Amazon product listings aimed at understanding pricing strategies, customer satisfaction, and market segmentation. The analysis leverages statistical and machine-learning models to explore the relationships between product features, ratings, and prices. Dataset

Source: Amazon Sales Dataset by Karkavelraja J.
Description: The dataset contains product details such as product_name, category, discounted_price, actual_price, rating, and more. It provides a rich source of information for analyzing how pricing and product features relate to customer satisfaction on Amazon.

Methodology

Data Collection & Preparation:
    Data was sourced from Kaggle and preprocessed to clean and format pricing information.
    Descriptive statistics and exploratory data analysis (EDA) were conducted to understand the data structure and distributions.

Exploratory Data Analysis (EDA):
    A histogram was created to visualize the distribution of log-transformed discounted prices.
    A scatter plot was used to investigate the relationship between product ratings and prices.
    A bar chart showed the count of products across different categories.

Statistical Modeling & Machine Learning:
    Linear Regression: Explored the relationships between product ratings, categories, and prices.
    K-means Clustering: Employed to segment the data into clusters based on ratings and prices.
    Gaussian Mixture Model (GMM): Used for advanced clustering to reveal hidden patterns in the data.

Visualization & Insights:
    Word clouds provided an overview of frequently used terms in product descriptions.
    Violin plots highlighted the distribution of product ratings, showing a high density of ratings in the 4-5 range.
    Sentiment analysis assessed the sentiment scores of product descriptions, indicating that most are positively worded.

Key Findings

Pricing Patterns: The log-transformed price distribution revealed multiple peaks, indicating clusters around certain price points.
Ratings vs. Prices: A scatter plot showed that higher product ratings do not necessarily correlate with higher prices.
Category Insights: Products in categories like Electronics, Computers & Accessories, and Home & Kitchen were most common.
Clustering Results: K-means and GMM clustering identified distinct groups of products based on their pricing and ratings.

Project Structure

amazon.csv: The main dataset containing product listings and attributes.
Amazon .Rmd: The RMarkdown file containing the code for data analysis, visualization, and modeling.
DATASCIENCE PROJECT.pptx: A presentation summarizing the project's objectives, methodology, and key insights.
README.md: This file provides an overview of the project, its structure, and how to navigate the repository.

How to Run the Analysis

Clone the repository.
Open the Amazon .Rmd file in RStudio.
Run the code chunks sequentially to reproduce the analysis, visualizations, and models.

Libraries & Tools Used

R Packages:
    dplyr: Data manipulation.
    ggplot2: Data visualization.
    mclust: Gaussian mixture models.
    tm, sentimentr: Text mining and sentiment analysis.
    wordcloud: Creating word clouds.
Tools:
    RStudio for code execution and visualization.
    PowerPoint for presentation.

Summary & Learnings

This project provided insights into consumer behavior on Amazon through data analysis and statistical modeling. It highlighted the complex relationship between pricing and ratings, the importance of product categories, and the value of clustering for market segmentation. It also served as a learning experience in applying machine learning techniques such as K-means and GMM to real-world e-commerce data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Amazon Product Analytics		Amazon Product Analytics
Amazon Code .Rmd		Amazon Code .Rmd
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

License

AArashev/AmazonAnalytics

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages