Does Your Background Play A Role In Your Education?

Hi, my name is Maninder Singh and this is my Data Science Project

This website will not only show my full analysis of NYC open data, but will also demonstrate the utilization of python's libaries, such as pandas, seaborn and matplotlib

Objective:

I wanted to research the Demographic of nyc schools and discover if there is a correlation between race/poverty levels in schools. I wanted to know if we see any trends between schools with a high number of POC students and poverty levels, and vice versa. Another issue I wanted to see was, do we see any trends between the graduation rates and racial background? I essentially want to see if there are patterns between one’s background and their school.

This idea was important to me specifically because being of Asian descent as well as a low economic background, I went to schools that didn't offer enough programs/materials for my needs. I wanted to tackle this issue and see if there is more that meets the eye.

Along with this idea being important to me in particular, It’s important to look into this because there are definitely faults in the American school system. A handful of students will have negative things to say about their education and how it greatly impacted them later on in their lives and this research is important because it's one step into the door of trying to fix the school system. Right now, there can’t be any solutions to the problem if we don’t even know what the problem is. It could be a funding problem, racial problems etc. As stated before, the point of this research is to provide data that can either show abnormal patterns in the school, or in the best case scenario, none at all (example: all schools no matter what background/economic status, have students at equal graduation rates).

Hypothesis:

I predict that there will be signicant trends between poverty vs racial background and graduation vs racial background. The American school system has been flawed for quite some time, and speaking from first hand experience, there are a lot of oppurtunies not present for minorities.

Security and Privacy Considerations:

I will not be working with PII [Personal Identifable Information]. Every data source is from NYC OPEN DATA.

Key Terms to know

Correlation

Images are from Towards Data Science and geeksforgeeks.

One major term for anyone looking into this research is to know the meaning of correlation. Correlation is a connection between two things, NOT causation. Just because we see a pattern, does not necessarily mean those 2 things are dependent on one another. In simple terms, correlation is simply a relationship. Action A relates to Action B—but one event doesn’t necessarily cause the other event to happen. In this research, witnessed trends are just going to be correlations since there is no definite proof that one event triggers another.

One another major term is Linear regression. In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. In basic terms, its a line used to show a trend as you will see in the "Data Analysis" section.

Data Analysis + Solution

Data collected using Pandas, DataTime, Seaborn and scipy.stats

Correlation Coefficient for White students vs Poverty % of School: -0.77

Correlation Coefficient for Black students vs Poverty % of School: 0.39

Correlation Coefficient for Asian students vs Poverty % of School: -0.26

Correlation Coefficient for Hispanic students vs Poverty % of School: 0.49

Data Analysis:

Before going on into the data analysis, all the code to gather the following data above can be found in " Resources + Code " section. Majority of the collected data was forged by utilizing Pandas and filtering each dataframe to gather the needed columns, and using seaborn to plot them.

The first four lm plots show the distrbution of all schools divided into 4 different ethnicities plotted against poverty percentage of schools. As you can see, the best case sceanrio [No trend being seen] was not presented. There is clear correlation between ethnicities and poverty percentages. There is already numerous evidence displaying the clear wealth gap between racial minorities and white indivduals, however the purpose of this project was to show the disparity within schools, and as you can see, the more people of color within schools, the higher the poverty rates within the school. [Though the correlation coefficient is low for black students, there is clear evidence showing a linear trend]. Starting with White students, there is already a very clear trend, where the more white students a school has, the school will more likely have lower levels of poverty. The correlation coefficient calculated using scipy.stats also justfies the behavior with a shocking -.7. Onto Black students, thought the correlation coefficient is small, it is still a positive coefficient with a regression line showing another clear trend, opposite of white students. A huge issue arrises with Asian students which is discussed below. For hispanic students, with a higher correlation coefficient and sadly better clarity, shows another positive trend between the number of hispanic students in schools vs poverty levels.

One interesting detail that was displayed that wasn't part of my hypothesis was that the graph between Asians vs % Poverty levels show almost no correlation. Though the linear regression line shows a negative correlation, if you look at all the points, it seems a bit random. This led be to further look into the data I collected which pointed me into a signicant that shouldn't be ignored, which is the fact that Asians is a misidentified group of minorities. Almost all minorities from different parts of Asia are just simply classified as "Asian". This is a huge issue because that means indivduals of Chinese, Indian, Korean, Japanese, Vietnamese, Indonesian, Malyasian etc. are all classified into one category. This can be very misleading because there can be signicant outliers for Asians, having one racial group have a poverty % than others or vise versa, presenting a huge problem for the Asian community.

The second half of the data, shows a time series of graduation rates from 2005 to 2014 for different racial groups. Thankfully, there seems to be a general increase in graduation rates in NYC for all students, regardless of background. However, there still seems to be a huge disparty between minority groups. As for Asians, I noted above that the data for Asians tend to be flawed because its a general census for the entire Asian community, so we cannot really explain the trend. However for black and hispanic students, though the graph shows an increase in graduation rate throughout the years, the graduation rate is still sitting below a 70% average graduation rate, while compared to white students, who are sitting at an average 75-80%. Whether it be funding, after-school programs, teaching stragdies, there is a clear racial disparity problem going on in NYC schools.

Solutions I advice:

From the data collected, there is clear evidence that there is a trend, and as stated before, it is not necessaril cause and effect , but there is a relationship present between racial backgrounds and education. My solution for the NYC department of education is to first take the term "Asian" and breaking it up into multiple sub categories, because classifying everyone of Asian descent can be misleading and can result in numerous errors and downfalls for students in need. My second solution is to look into the root of problem. Since there is a clear issue at hand, the best idea is to tackle whats causing the divide between minorites/lower economic familes and white/higher economic status students. Do schools with more white students have better programs or are schools with higher number of minorites not getting enough funding?

Resources + Code

Resources:

https://data.cityofnewyork.us/Education/2018-2019-School-Demographic-Snapshot/45j8-f6um/data

https://data.cityofnewyork.us/Education/2018-DOE-High-School-Directory/vw9i-7mzq,

This HTML page was a template gathered from codepenIO

Code:

For this assignment, I use pandas to read in and filter the data sets, as well as use the datatime library to create the time series for this graph.

More specifically, using the knowledge from data science, I filtered the data sets from NYC OPEN DATA and collected the columns needed for this project. Those being linked to ethnic background and poverty % of schools, as well as graduation rates. For the time betwen the years of 2005-2014, I used the DateTime function to change all the strings into years, then used a groupby function, aggreated it by mean, to collect a time series of graduation rates between each ethnicity in NYC. After the collection of data, everything is analyzed in the "Data Analysis + Solution" section of this website.