You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: Cleansing_Exploration/project2.qmd
+263-3
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: "Client Report - [Insert Project Title]"
2
+
title: "Client Report - Finding Relationships in Baseball"
3
3
subtitle: "Course DS 250"
4
-
author: "[STUDENT NAME]"
4
+
author: "Brian Munoz"
5
5
format:
6
6
html:
7
7
self-contained: true
@@ -25,4 +25,264 @@ execute:
25
25
26
26
---
27
27
28
-
### Paste in a template
28
+
29
+
```{python}
30
+
import pandas as pd
31
+
import numpy as np
32
+
import sqlite3
33
+
import matplotlib.pyplot as plt
34
+
import plotly.graph_objects as go
35
+
from plotly.subplots import make_subplots
36
+
```
37
+
38
+
39
+
### Baseball, a game of perspective
40
+
41
+
_This report will allow us to observe the importance of not limiting ourselves to the most recent results. We will observe how the effectiveness of the players changes as they participate in more games. The success of those of players who have played at BYU-Idaho. And finally we will compare the effectiveness in which two great teams use their resources and how this affects their number of victories. _
42
+
43
+
## QUESTION|TASK 1
44
+
45
+
__Write an SQL query to create a new dataframe about baseball players who attended BYU-Idaho. The new table should contain five columns: playerID, schoolID, salary, and the yearID/teamID associated with each salary. Order the table by salary (highest to lowest) and print out the table in your report.__
__This three-part question requires you to calculate batting average (number of hits divided by the number of at-bats)__
117
+
118
+
##### A. Write an SQL query that provides playerID, yearID, and batting average for players with at least 1 at bat that year. Sort the table from highest batting average to lowest, and then by playerid alphabetically. Show the top 5 results in your report.
119
+
120
+
-They where some players that only where at the bat onces which amde their average batting score extremly higher than other players in comparition-
121
+
122
+
```{python}
123
+
#| label: Q2
124
+
#| code-summary: 1 game table
125
+
126
+
query = """
127
+
SELECT playerID, yearID,
128
+
CAST(SUM(H) AS FLOAT) AS total_hits,
129
+
CAST(SUM(AB) AS FLOAT) AS total_at_bats,
130
+
(CAST(SUM(H) AS FLOAT) / CAST(SUM(AB) AS FLOAT))*100 AS batting_average_percentage
131
+
FROM batting
132
+
WHERE H >= 1
133
+
GROUP BY playerID, yearID
134
+
ORDER BY batting_average_percentage DESC, playerID
##### C. Now calculate the batting average for players over their entire careers (all years combined). Only include players with at least 100 at bats, and print the top 5 results.
181
+
182
+
-Now we can observe players who not only performed well, but also had greater participation in their teams. The following table shows that as the hitting percentage decreases, the participation of the players increases. In conclusion, it is normal to expect hitting percentage to drop as players participate in more games.-
183
+
184
+
185
+
```{python}
186
+
#| label: Q2-table
187
+
#| code-summary: 100 games table
188
+
189
+
query = """
190
+
SELECT playerID, yearID,
191
+
CAST(SUM(H) AS FLOAT) AS total_hits,
192
+
CAST(SUM(AB) AS FLOAT) AS total_at_bats,
193
+
(CAST(SUM(H) AS FLOAT) / CAST(SUM(AB) AS FLOAT))*100 AS batting_average_percentage
194
+
FROM batting
195
+
WHERE H >= 100
196
+
GROUP BY playerID, yearID
197
+
ORDER BY batting_average_percentage DESC, playerID
__Pick any two baseball teams and compare them using a metric of your choice (average salary, home runs, number of wins, etc). Write an SQL query to get the data you need, then make a graph using Plotly Express to visualize the comparison. What do you learn?__
216
+
217
+
_I did a comparison of Total Salary and Wins of Yankees vs White Sox for the past 25 years. This allow us to see that even if the Yankees have a higher wins cound, White Sox have show great efficiency by having a great win record and espending almost 50% less than the Yankees._
218
+
219
+
```{python}
220
+
#| label: Q3
221
+
#| code-summary: Yankees vs Sox (25 years)
222
+
# Include and execute your code here
223
+
224
+
query = """
225
+
SELECT t.name,
226
+
ROUND(SUM(s.salary) / 1000000, 2) as team_total_salary,
227
+
ROUND(SUM(t.W), 2) as total_wins
228
+
FROM teams t
229
+
JOIN salaries s ON t.teamID = s.teamID AND t.yearID = s.yearID
230
+
WHERE t.teamID IN ("NYA","CHA")
231
+
AND t.name != 'New York Highlanders'
232
+
AND t.yearID BETWEEN 1992 AND 2016 -- Filter for the past 25 years
233
+
GROUP BY t.name
234
+
"""
235
+
236
+
# Execute the query and load results into a DataFrame
0 commit comments