Welcome to the Formula 1 Data Analysis Project, where we delve into the fascinating world of Formula 1 racing! This project uses real-world F1 data to explore driver performances, country-based dominance, race victories, and more.
With statistical tests, compelling visualizations, and thoughtful insights, this repository aims to showcase the art and science of motorsport analysis.
- Overview
- Features
- Data Sources
- Methodology
- Key Insights
- Technologies Used
- How to Use
- Visualizations
- Future Work
- Contributing
- License
Formula 1 is the pinnacle of motorsport, where drivers, teams, and countries compete for supremacy. This project analyzes historical F1 data to answer questions like:
- Which countries dominate Formula 1, and why?
- How do driver performances vary across years and circuits?
- What makes drivers like Lewis Hamilton and Michael Schumacher stand out?
Through a combination of Python programming, statistical tests, and visualization techniques, we uncover the hidden stories behind the data.
- One-way ANOVA to test country-level performance differences.
- Insights into significant F-statistics and p-values.
- Bar plots of total average points by country.
- Race victories across Grand Prix events.
- Wins-per-year trends for top drivers.
- Highlight drivers with the most pole positions in Formula 1 history.
- Compare race wins and championships across different eras and regions.
- Examine drivers' favorite circuits and why certain tracks favor specific drivers.
The project utilizes publicly available F1 datasets, including information on:
- Drivers: Biographical details and career statistics.
- Races: Locations, years, and results.
- Results: Individual driver performances across seasons.
Datasets are preprocessed using pandas to join and clean the data for analysis.
- Merge datasets to create a unified analysis-ready table.
- Handle missing data and fix inconsistencies.
- Perform ANOVA to evaluate differences between countries' performance.
- Create interactive plots (using Plotly) and word clouds to present insights.
- Analyze performance trends of top drivers across their careers.
- Study Grand Prix-specific wins to understand circuit dominance.
- Countries like the UK, Germany, and Italy dominate the leaderboard.
- Infrastructure and resources play a crucial role in shaping performance.
- Word cloud analysis reveals his dominance in securing pole positions.
- Certain circuits see repeated success for specific drivers, indicating track familiarity or team strengths.
- Visualizing win trends shows the era of dominance for drivers like Hamilton and Vettel.
- Generational legends like Schumacher, Fangio, and Hamilton emerge as outliers in championship counts.
- Programming Languages: Python
- Libraries:
- Data Analysis: pandas, numpy
- Visualization: matplotlib, plotly, seaborn
- Word Cloud: WordCloud
- Statistical Testing: scipy
- Clone the repository:
git clone https://github.com/yourusername/f1-data-analysis.git
cd f1-data-analysis
- Install dependencies:
Copy code
pip install -r requirements.txt
- Run the analysis:
Open and execute the Jupyter Notebook or Python scripts for specific analyses.
- View Visualizations:
Interactive plots and charts will be displayed in your browser.
📈 Visualizations
Circuits all around the world
Average position by constructor reference
🔮 Future Work Incorporate Machine Learning: Predict race outcomes based on historical data. Infrastructure Analysis: Study the correlation between country-level motorsport investments and driver success. Team Analysis: Investigate team-level dominance and trends over time. Real-Time Data: Integrate live F1 race data for dynamic updates.
🤝 Contributing Contributions are welcome! To contribute:
Fork the repository. Create a new branch for your feature/bug fix. Submit a pull request with a clear description.
📜 License This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to reach out for questions or collaboration ideas. Happy racing! 🏎️