Isfahan University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! On (X network) By
A Framework for Exploration and Cleaning of Environmental Data – Tehran Air Quality Data Experience Pubmed



Shamsipour M1, 2, 3 ; Farzadfar F3, 4 ; Gohari K3, 5 ; Parsaeian M1, 3 ; Amini H6, 7, 8 ; Rabiei K9 ; Hassanvand MS2, 10 ; Navidi I1, 3 ; Fotouhi A1 ; Naddafi K2, 10 ; Sarrafzadegan N9 ; Mansouri A3, 5 ; Mesdaghinia A2, 11 ; Larijani B4 Show All Authors
Authors
  1. Shamsipour M1, 2, 3
  2. Farzadfar F3, 4
  3. Gohari K3, 5
  4. Parsaeian M1, 3
  5. Amini H6, 7, 8
  6. Rabiei K9
  7. Hassanvand MS2, 10
  8. Navidi I1, 3
  9. Fotouhi A1
  10. Naddafi K2, 10
  11. Sarrafzadegan N9
  12. Mansouri A3, 5
  13. Mesdaghinia A2, 11
  14. Larijani B4
  15. Yunesian M2, 10
Show Affiliations
Authors Affiliations
  1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
  2. 2. Center for Air Pollution Research (CAPR), Institute for Environmental Research (IER), Tehran University of Medical Sciences, Tehran, Iran
  3. 3. Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
  4. 4. Endocrinology and Metabolism Research center, Endocrinology and Metabolism Research Institute, Tehran University of Medical sciences, Tehran, Iran
  5. 5. Department of Biostatistics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
  6. 6. Departmentof Epidemiology and Public Health, Swiss Tropical and Public Health Institute (Swiss TPH), Basel, Switzerland
  7. 7. University of Basel, Basel, Switzerland
  8. 8. Kurdistan Environmental Health Research Center, Kurdistan University of Medical Sciences, Sanandaj, Iran
  9. 9. Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
  10. 10. Department of Environmental Health Engineering, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
  11. 11. Center for Water Quality Research (CWQR), Institute for Environmental Research (IER), Tehran University of Medical Sciences, Tehran, Iran

Source: Archives of Iranian Medicine Published:2014


Abstract

Background: Management and cleaning of large environmental monitored data sets is a specific challenge. In this article, the authors present a novel framework for exploring and cleaning large datasets. As a case study, we applied the method on air quality data of Tehran, Iran from 1996 to 2013.; Methods: The framework consists of data acquisition [here, data of particulate matter with aerodynamic diameter ≤10 µm (PM10)], development of databases, initial descriptive analyses, removing inconsistent data with plausibility range, and detection of missing pattern. Additionally, we developed a novel tool entitled spatiotemporal screening tool (SST), which considers both spatial and temporal nature of data in process of outlier detection. We also evaluated the effect of dust storm in outlier detection phase.; Results: The raw mean concentration of PM10 before implementation of algorithms was 88.96 µg/m3 for 1996–2013 in Tehran. After implementing the algorithms, in total, 5.7% of data points were recognized as unacceptable outliers, from which 69% data points were detected by SST and 1% data points were detected via dust storm algorithm. In addition, 29% of unacceptable outlier values were not in the PR.; The mean concentration of PM10 after implementation of algorithms was 88.41 µg/m3. However, the standard deviation was significantly decreased from 90.86 µg/m3 to 61.64 µg/m3 after implementation of the algorithms. There was no distinguishable significant pattern according to hour, day, month, and year in missing data.; Conclusion: We developed a novel framework for cleaning of large environmental monitored data, which can identify hidden patterns. We also presented a complete picture of PM10 from 1996 to 2013 in Tehran. Finally, we propose implementation of our framework on large spatiotemporal databases, especially in developing countries. © 2014, Academy of Medical Sciences of I.R. Iran. All rights reserved.