Project Purpose
I created this project as a fun way to build my skills in PowerBI, and practice web scrapping.
Project Idea
This project blends two of my favorite things together, soccer and data. I want to report each position's best players using data from Major League Soccer. The positions would be split into forwards, midfielders, defenders, and goalkeepers. Each position would have unique identifiers for the best player. For example, the forwards would be based on their average (mean) goals scored per season while the midfielders would be based on their average assists per season. Users shall be able to navigate between different positions and be able to select specific players to see additional statistics.
Python
(EDA and Cleaning)
PowerBI
(Dashboarding)
EDA consisted of understanding the meanings behind column headers, assessing null entries, recognizing keys, and noting important columns.
I merged tables such as the players' table to the keepers' table to find out how many times a defender's goalkeeper was scored on (hence the defender was scored on).
The data used for the current dashboard was previously scraped by the profile user connected to the Kaggle link. I would like to personally scrape this information or access it through an API. This will allow me to update the data and build my confidence in the correctness of the data.
I would like to show the faces of the selected player, but I do not want to overcrowd the dashboard. Pictures of the players should be a simple web scrape of the MLS site.
Currently, the dashboard is only a summary of MLS data. I would like to develop further insights about this data such as:
Which clubs have the best history of each position?
Do clubs with a low amount of goals scored tend to be scored on less?
Do clubs with a high amount of goals scored tend to be scored on more?