Project Purpose
This project was done to complete my Data Analytics certification at Nashville software school. My overall goal was to showcase the skills I acquired during the course. The total time spent on the project accumulates to 2 weeks. This includes brainstorming the idea, collecting data, cleaning and EDA of data, and dashboarding.
Project Idea
As someone who plans to have kids, I am interested in the effects of childhood internet use. This analysis searches for correlations between internet use and childhood success. The data questions involve comparing internet usage to test scores, fitness, and mental health across the United States. The data has been gathered from the Census Bureau, Nation’s Report Card, and Centers for Disease Control and Prevention. APIs, CSVs, and Migration Toolkits were used to collect the data.
I am a young adult planning on having a family, and I would like to know what will aid in my children’s success. Since the dominating appearance of the internet, much debate has been on the advantages and disadvantages of granting children internet access. I would like to see if there is any clear correlation between internet use and childhood success.
As the number of children (ages 3 – 17) have increased internet access, do test scores for 4th, 8th, or 12th grade increase?
(Scale: Country, United States of America)
Is there a strong correlation between internet users and childhood obesity?
(Scale: Country, United States of America)
Does the amount of internet users correlate to children’s mentality, e.g., feelings of anxiety/depression or general bullying?
(Scale: Country, United States of America)
For each MVP question, is there a significant difference in the correlation coefficient at a global versus country scale?
Is there any connection to young adult (20-29) internet use and employment or degree obtainment rates?
Current data sources only support the MVP and some stretch questions. Additional sources will be needed to answer the rest of the stretch questions.
United States Census Bureau
Internet access by year, state, and age group
(1997-2021)
NAEP Data Service API
Composite scores by year, state, grade, subject
(1990-2021)
Centers for Disease Control and Prevention
Childhood weight and mentality issues by Country
(1975 - 2019)
The World Bank
Global internet usage by Country
(1960 - 2021)
Python
SQL
Excel
ESF Migration Toolkit
Tableau
Finding the data was by far the most challenging part of this project. Once I found my data sources, the real fun began. The fun was building programs to collect the data from the source. For example, I constructed a for loop that pulled in test score information for each year, each state (in the USA), each grade, and each subject. I also discovered the ESF Database Migration Toolkit which allowed me to pull in Microsoft Access file types into PGAdmin where I could then write SQL queries to pull in the exact information I needed.
Data cleaning and EDA included converting state abbreviations to their full names, converting multiple-year columns into a single column, calculating percentages from counts of individuals, joining data frames, calculating correlation coefficients, sorting survey questions, replacing categorical numbers with their respective text, and making sure I had sufficient yearly records to accurately assess internet access trends.
4th and 8th graders' composite scores for reading and mathematics have positive medium to positive strong correlations with home internet access.
12th graders' composite scores for reading and mathematics have negative medium correlations with internet access..
For the scope of the USA, 4th, 8th, and 12th graders have very strong and postive correlations with weight problems and internet access.
On a global scope, all grades have positive medium correlations with weight problems and internet access.
Child suicide rates have been fairly consistent since 1991 with only a 2% increase when looking at 2019. Because of this, there is a positive and weak correlation between child suicide and internet access.
Physical fighting in children and their internet access has a negative and strong correlation.
The percentage of children feeling sad or hopeless has a negative and medium correlation to internet access.
Since 4th and 8th graders have positive correlations to internet access and 12th grades have a negative correlation, screen time and what each grade is primarily doing while on the internet should be looked into next.
The current study only looks at obese and overweight as described by an individuals BMI. However, BMI does not account for body composition such as muscle mass or body fat percentages. Because of this, the next factors to be assessed for childhood fitness and their internet access should be the amount of physical activity they participate in and the food groups they are consuming.
Currently, the CDC only has data on children feeling sad or hopeless starting in 2003. I believe having data on this measure prior to anyone having access to the internet will improve the accuracy of the correlation coefficient. Additionally, this project looks into physical bullying, but online bullying should also be assessed.