Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer review #2

Open
adway opened this issue Apr 15, 2022 · 0 comments
Open

Peer review #2

adway opened this issue Apr 15, 2022 · 0 comments

Comments

@adway
Copy link

adway commented Apr 15, 2022

Peer review by: the_three_musketeers

Names of team members that participated in this review: Adway Wadekar, Sam Breault, Frankie Willard
Describe the goal of the project.
The goal of the project is to determine what characteristics affect the rating of different chocolates (from the Manhattan Chocolate Society review) and in what way.
Describe the data used or collected, if any. If the proposal does not include the use of a specific dataset, comment on whether the project would be strengthened by the inclusion of a dataset.
The data used comes from the Manhattan Chocolate Society, which rates chocolate bars and logs several characteristics about each chocolate bar including company, place of origin, and ingredients.
Describe the approaches, tools, and methods that will be used.
First, the team used some exploratory data analysis with histograms, boxplots, barplots, scatter plots, density plots, and maps. This helped them identify the distributions of the variables that they were looking into, as well as some of their relationships with the target variable of the chocolate rating. They develop two models using a linear regression with a different combination of features (one supersets the other), and then perform v-fold cross validation, investigating the RMSE and R squared of each model.
Is there anything that is unclear from the proposal?
In their methodology they claim they are going to use three models and evaluate them all, but they only seem to evaluate two of these models.
Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the statistical modeling aspect of the project, but do feel free to comment on aspects beyond the modeling.
I think it may be wise to discuss whether a linear regression is best for this type of target variable. This is an odd case, in which the target variable appears to take on decimal values, but it is still a discrete variable (with outcomes between 1 to 5 that increase by 0.25). If there is enough data, a multinomial regression could work for predicting this rating. Given the nature of a rating, however, some kind of ordinal logistic regression seems best for predicting.
What aspect of this project are you most interested in and would like to see highlighted in the presentation.
I am most interested in the variable they find to be the most predictive of chocolate rating. Additionally, I would be intrigued to see some kind of choropleth map displaying the average quality of chocolate by origin, as they looked into these variables with several maps.
Provide constructive feedback on any issues with file and/or code organization.
There is one part of the exploratory data analysis in which there are many visualizations in a row. I think there could be benefit to pairing each visualization with each interpretation more clearly, such that you can see the visualization and read its interpretation at the same time. Another solution would be numbering their figures so it is clear what they are interpreting in a given moment.
(Optional) Any further comments or feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant