Team: 1
School: Cleveland High
Area of Science: Environmental Sciences
Interim:
Problem Definition
Ground ozone is produced when the pollutants NOx and volatile organic compounds (VOCs), which are often emitted by cars, power plants, and other human activities, undergo chemical reactions with sunlight and heat. According to the CDC, people with asthma, bronchitis, emphysema, old people, people who exercise outside, and children are all examples of people who are affected the most by high ground ozone levels. This is why it is important to be able to predict when ozone levels will be highest so that high-risk people can know to stay indoors and limit their use of ozone-emitting activities. Health effects of ground ozone can include coughing, lung irritation, throat irritation, wheezing, and trouble breathing, especially when doing physical activities outside.
Ground ozone is not the only pollutant that can affect people, however, which is why it is useful to be able to predict when various other pollutants will be at the highest levels. By using machine learning to predict when ozone levels will be the highest, people can take actions that will prevent them from adding to the problem or being harmed by it.
Problem Solution
One solution to minimizing the negative impact of air pollution (specifically ground-level ozone) would be to determine a way to accurately predict the concentration levels in advance. This would be done by considering a variety of other factors, such as previous ozone levels, meteorological data, other air pollutant concentrations, and more. This goal can be achieved by finding accurate data of many factors for a certain location over a few years and using machine learning to find statistically significant correlations between many variables by implementing multivariate regression with the python library called sklearn.
Progress to Date
First, the team downloaded a dataset of air pollution in India from 2015-2020 and uploaded it to Repl, the site used by the team to work on code together. The dataset included daily data of air pollution factors such as particulate matter, NOx, carbon monoxide, ozone, benzene, xylene, and more for over 20 cities in India. For each city, we made histograms of the daily occurrences of each pollutant as well as a marker to compare these values to the concentration of pollutant deemed safe in the US. After making these plots, we examined cities with the most daily pollutant occurrences above the safe levels. Delhi was determined to be the best option since it had the most data with highest pollutant concentrations.
Next, the team worked on compiling CSV files including data on the hourly air pollution and meteorological data in Delhi, India. First, the team downloaded a dataset from Kaggle, an online resource for datasets, which included data on 14 different hourly pollutant concentrations from 2015-2020. Next, we searched for a data set with hourly weather conditions for the same dates and location. Although there was not a free dataset readily available, we were able to use an application program interface, or API, to compile 5 years of weather data from Delhi by entering the inputs for start and end time, location, and more. The Visual Crossing Weather API by Visual Crossing Corporation provided 500 rows of data at a time, which included factors such as hourly temperature, humidity, wind speed and direction, visibility, cloud cover, and more. By repeating this process, the hourly weather CSV file was created. A simple code checked for consistency in dates, and any errors were fixed.
To reduce the number of NaN (null) values for singular missing values, we found the average between the previous and the next values, and replaced it in the data set. Consecutive NaN values were left alone to avoid inaccurate results. Next, we created scatterplots of the correlation between ozone and other pollutants as well as weather conditions. So far, the strongest correlations have been found between ground level ozone and sulfur dioxide, temperature, NOx, and PM 10. The next step of the project includes using the sklearn python library to determine the strongest correlations between multiple meteorological factors, pollutant levels, and ozone, using machine learning to predict ground ozone levels in advance.
Expected Results
We expect to find a correlation between meteorological occurrences and the amount of pollutants present. Variables such as sunlight and increased humidity may be factors in prompting the chemical reactions that produce the pollutants we are examining. Weather is predictable. By finding these correlations of weather patterns to pollutant levels, we will be able to predict when the levels of certain pollutants will be especially high with machine learning by considering meteorological factors and other pollutant levels.
Bibliography
Air Quality - Ozone and Your Health. (2019, September 04). Retrieved December 11, 2020, from https://www.cdc.gov/air/ozone.html
How Weather Affects Air Quality. (n.d.). Retrieved December 11, 2020, from https://scied.ucar.edu/learning-zone/air-quality/how-weather-affects-air-quality K, D. (2020, November 01).
Implementing Multiple Linear Regression Using sklearn. Retrieved December 11, 2020, from https://heartbeat.fritz.ai/implementing-multiple-linear-regression-using-sklearn-43b3d3f2fe8b
Visual Crossing API. (n.d.). Retrieved December 11, 2020, from https://rapidapi.com/visual-crossing-corporation-visual-crossing-corporation-default/api/visual-crossing-weather/details
Vopani. (2020, July 28). Air Quality Data in India (2015 - 2020). Retrieved December 11, 2020, from https://www.kaggle.com/rohanrao/air-quality-data-in-india
Mentors
Ashli Johnston, Mark PetersenTeam Members:
Eliana Juarez
Sofia Juarez
Graciela Rodriguez
Sponsoring Teacher: Ashli Johnston