Our data is spatiotemporally rich and is one of the only publicly available datasets for PM values based in a developing country. The difference between the PM value distributions of a developing and a developed country is staggering. To corroborate our point and the need for a dataset like ours, we provide a detailed analysis comparing a mobile air monitoring dataset[2] which belongs to the city Hamilton in Ontario, Canada collected during 114 days of air pollution monitoring between November 2005 to November 2016 with our dataset based on a number of parameters. For the purpose of comparison, we divide Delhi and Hamilton into regions by rounding off the latitude and longitude of each record to 3 decimal places and the grouping all the record belonging to each region.
The following tables show comparison of Delhi dataset with canada dataset[2] over different parameters. It also presents statistical comparison of PM values recorded by both datasets.
Metric | Delhi Dataset | Canada Dataset | Zurich dataset |
---|---|---|---|
Total number of samples | 12542183 | 46080 | Varying for different pollutants (19.9 - 49.7 Million) |
Sample with atleast one PM (1.0, 2.5, 10.0) value | 12542183 | 13048 | - |
Pollutants covered | PM1, PM2.5, PM10 | CO, NO, NO2, SO2, O3, PM1, PM2.5, PM10 | O3, CO, NO2 and UFP |
Vehicles used | Public bus | Commercial van | Trams |
Monitoring Period | 91 days | 114 days | Varying for different pollutants (2.5 - 4.5 years) |
Pollutants | vendor1 (honeywell) | vendor2 (alphasense) | vendor3 (winsen) |
---|---|---|---|
SO2 | 17,337 INR | 8,500 INR | 10,600 INR |
NO | 14,600 INR | 8,500 INR | - |
NO2 | 10,618 INR | 8,500 INR | 10,530 INR |
CO | - | 8,500 INR | 4,450 INR |
CO2 | - | - | 2,910 INR |
O3 | - | - | 4,288.28 INR |
Metric | Delhi Dataset | Canada Dataset | ||||
---|---|---|---|---|---|---|
PM1 | PM2.5 | PM10 | PM1 | PM2.5 | PM10 | |
Min | 1 | 1 | 1 | 0 | 0 | 0 |
Max | 1730.5 | 1792 | 1903 | 2640 | 731 | 291 |
Mean | 120.348 | 207.9248 | 226.1106 | 46.45 | 15.08 | 12.15 |
Std | 57.2723 | 114.3632 | 123.8647 | 97.36 | 12.87 | 9.02 |
5th Percentile | 45 | 72 | 79 | 4 | 6 | 8 |
95th Percentile | 233 | 435 | 471 | 28 | 32 | 138 |
Missing % | 0 | 0 | 0 | 72.24 | 73.62 | 71.71 |
The first parameter used for comparison is total number of records collected in every region in the city across the whole duration of the data. The below figures shows this over the cities of Hamilton and Delhi.The color of each circle in the maps represent total number of points collected in every region. As it can be seen from the maps, the circles over most of the regions in Hamilton are of green color which corresponds to 0-20 points over a region. But there is a large variation in the color of circles over Delhi which correspond to more than 500 data points over most regions. This indicates that the number of data points collected across each region in Delhi is much higher than the number of points collected across each region in Hamilton.
The below graphs show the average of PM2.5 values recorded in every region throughout the dataset over the cities of Hamilton and Delhi. The color of each circle in the maps represent average PM2.5 inevery regions We can observe that the average PM value recorded in most of the regions in Hamilton is in 0-50 range and there are only a few regions that average of greater than 50. Almost none of the regions record an average greater than 100 across the whole duration of data.On the other hand, in Delhi we can see that average PM for most regions is greater than 250 which is much higher than any of regions in Hamilton.
Here, we compare the variance in PM2.5 level in all regions across the whole duration of data. Figure below shows this over the cities of Hamilton and Delhi. The color of each circle in the maps represent PM2.5 variance in every region. We can see that for Hamilton, PM2.5 level varies in a very small region of 0-50 across most of the regions and there are a very few points where the variance is greater than 150. We should also note that this variance is observed in a period spanning 11 years. On the other hand, we see very very high variance in PM2.5 levels recorded across almost all the regions in Delhi and this variance has been observed over just 3 months.
It is found that in Canada dataset[2], there are some hours where there are no samples at all which is shown using bar plots below. For each hour in a day, we count the total number of minutes which have at least one sample across the whole dataset. We observe that in our dataset we have samples for each minute of each hour. Whereas in Canada dataset[2] there are atleast 9 empty hours and most of them in the night time
Below plots show the frequency of PM values in Delhi and Canada datasets respectively. Most of the PM2.5 values lie in the range of 0 to 60 for the Canada dataset while it is in the range 0 to 750 in case of Delhi. Not only the range of PM values is high in our dataset but the frequency of each PM value is also high. Most frequent PM2.5 value in case of Hamilton is around 10 and in case of Delhi is 150. The above analysis also holds in case of PM1 and PM10 values.
Jason Jingshi Li, Boi Faltings, Olga Saukh, David Hasenfratz, and Jan Beutel. Sensing the airwe breathe: The opensense zurich dataset. In proceedings of the Twenty-Sixth AAAI Conferenceon Artificial Intelligence , AAAI’12, page 323–325. AAAI Press, 2012.
Matthew D. Adams and Denis Corr. A mobile air pollution monitoring data set.Data, 4(1),2019