You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Names of team members that participated in this review: Adway Wadekar, Sam Breault, Frankie Willard
Describe the goal of the project.
The goal of the project is to determine what characteristics affect the rating of different chocolates (from the Manhattan Chocolate Society review) and in what way.
Describe the data used or collected, if any. If the proposal does not include the use of a specific dataset, comment on whether the project would be strengthened by the inclusion of a dataset.
The data used comes from the Manhattan Chocolate Society, which rates chocolate bars and logs several characteristics about each chocolate bar including company, place of origin, and ingredients.
Describe the approaches, tools, and methods that will be used.
First, the team used some exploratory data analysis with histograms, boxplots, barplots, scatter plots, density plots, and maps. This helped them identify the distributions of the variables that they were looking into, as well as some of their relationships with the target variable of the chocolate rating. They develop two models using a linear regression with a different combination of features (one supersets the other), and then perform v-fold cross validation, investigating the RMSE and R squared of each model.
Is there anything that is unclear from the proposal?
In their methodology they claim they are going to use three models and evaluate them all, but they only seem to evaluate two of these models.
Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the statistical modeling aspect of the project, but do feel free to comment on aspects beyond the modeling.
I think it may be wise to discuss whether a linear regression is best for this type of target variable. This is an odd case, in which the target variable appears to take on decimal values, but it is still a discrete variable (with outcomes between 1 to 5 that increase by 0.25). If there is enough data, a multinomial regression could work for predicting this rating. Given the nature of a rating, however, some kind of ordinal logistic regression seems best for predicting.
What aspect of this project are you most interested in and would like to see highlighted in the presentation.
I am most interested in the variable they find to be the most predictive of chocolate rating. Additionally, I would be intrigued to see some kind of choropleth map displaying the average quality of chocolate by origin, as they looked into these variables with several maps.
Provide constructive feedback on any issues with file and/or code organization.
There is one part of the exploratory data analysis in which there are many visualizations in a row. I think there could be benefit to pairing each visualization with each interpretation more clearly, such that you can see the visualization and read its interpretation at the same time. Another solution would be numbering their figures so it is clear what they are interpreting in a given moment.
(Optional) Any further comments or feedback?
The text was updated successfully, but these errors were encountered:
Peer review by: the_three_musketeers
Names of team members that participated in this review: Adway Wadekar, Sam Breault, Frankie Willard
Describe the goal of the project.
The goal of the project is to determine what characteristics affect the rating of different chocolates (from the Manhattan Chocolate Society review) and in what way.
Describe the data used or collected, if any. If the proposal does not include the use of a specific dataset, comment on whether the project would be strengthened by the inclusion of a dataset.
The data used comes from the Manhattan Chocolate Society, which rates chocolate bars and logs several characteristics about each chocolate bar including company, place of origin, and ingredients.
Describe the approaches, tools, and methods that will be used.
First, the team used some exploratory data analysis with histograms, boxplots, barplots, scatter plots, density plots, and maps. This helped them identify the distributions of the variables that they were looking into, as well as some of their relationships with the target variable of the chocolate rating. They develop two models using a linear regression with a different combination of features (one supersets the other), and then perform v-fold cross validation, investigating the RMSE and R squared of each model.
Is there anything that is unclear from the proposal?
In their methodology they claim they are going to use three models and evaluate them all, but they only seem to evaluate two of these models.
Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the statistical modeling aspect of the project, but do feel free to comment on aspects beyond the modeling.
I think it may be wise to discuss whether a linear regression is best for this type of target variable. This is an odd case, in which the target variable appears to take on decimal values, but it is still a discrete variable (with outcomes between 1 to 5 that increase by 0.25). If there is enough data, a multinomial regression could work for predicting this rating. Given the nature of a rating, however, some kind of ordinal logistic regression seems best for predicting.
What aspect of this project are you most interested in and would like to see highlighted in the presentation.
I am most interested in the variable they find to be the most predictive of chocolate rating. Additionally, I would be intrigued to see some kind of choropleth map displaying the average quality of chocolate by origin, as they looked into these variables with several maps.
Provide constructive feedback on any issues with file and/or code organization.
There is one part of the exploratory data analysis in which there are many visualizations in a row. I think there could be benefit to pairing each visualization with each interpretation more clearly, such that you can see the visualization and read its interpretation at the same time. Another solution would be numbering their figures so it is clear what they are interpreting in a given moment.
(Optional) Any further comments or feedback?
The text was updated successfully, but these errors were encountered: