GitHub - wegiangb/air-quality-observation-classification

air-quality-observation-classification

Validating and removing errors outliers from surface air quality observations from individual sensors so that these observation can be compared to ECMWF's CAMS air quality forecasts. By clustering analysis on these observations more reliable observations can be identified. Enhancing these observations by attaching data about factors that affect air quality these observations can have more credibility about their accuracy. CAMS lacks credible surface air quality observations in many parts of the world, often in the most polluted area such as in India or Africa. Some observations are available for these areas from data harvesting efforts such as openAQ but there is no quality control applied to the data, and it is often not well known if the observations are made in a rural, urban or heavily polluted local environment. This information on the environment is important because the very locally influenced measurements are mostly not representative for the horizontal scale (40 km) of the CAMS forecasts and should therefore not be used for the evaluation of the CAMS model.

Implementation Cluster Analysis

GMM Kmeans

Cluster Measure

Randscore

Presentation results

Using Gatherminer

Dependencies

Gatherminer An interactive visual tool for time series analysis.

Publication: Sarkar, Advait, Martin Spott, Alan F. Blackwell, and Mateja Jamnik. "Visual discovery and model-driven explanation of time series patterns." In Visual Languages and Human-Centric Computing (VL/HCC), 2016 IEEE Symposium on, pp. 78-86. IEEE, 2016.

http://dx.doi.org/10.1109/VLHCC.2016.7739668 Last updated April 2017, for v0.7

SPEED TIPS:

Under the gathering strategy, there is a new option 'Greedy imperfect seed' that is even faster and gives near-identical results. For large datasets (e.g., around 3k or more rows) , unchecking "precompute distances" actually makes it faster because storing and retrieving from an n^2 distance matrix in memory becomes slower than just recomputing the distances on demand. Gathering now works for quite large datasets (I think I have tried with up to 30k rows) You must uncheck "precompute distances". It can be very slow though (e.g., several minutes). If you keep the Chrome console open (View > Developer > Javascript console) then you can see whether gathering has crashed or is still running because I output status update messages from time to time. Zooming on very large datasets will break, because Chrome has a maximum canvas size. It is very irritating that this bottleneck exists but I think it will require too much engineering to fix at this point. FILE FORMATS: The file formats are very simple. The series data is a comma-separated file with each line containing one time series. There is no header line.

The series attributes is a comma-separated file with each line containing the attributes for the time series. There must be a header line containing the names of attributes. Therefore, the Nth line of the attributes file provides the attributes for the N-1th line of the series data file.

The CSV parsing, if I remember correctly, ignores any quote qualification. So if the actual values in your dataset contain commas, then you'll have to substitute them with some other character, otherwise the tool will break.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AirNode_AQCA_Milestone2_10_clustering.py		AirNode_AQCA_Milestone2_10_clustering.py
AirNode_AQCA_Milestone2_10_clustering_2.py		AirNode_AQCA_Milestone2_10_clustering_2.py
AirNode_AQCA_Milestone2_PySal.py		AirNode_AQCA_Milestone2_PySal.py
AirNode_AQCA_Milestone2_Spatial_Voronoi.py		AirNode_AQCA_Milestone2_Spatial_Voronoi.py
AirNode_Esowc_GanttChart_Aug2020_Nov2020_Schedule_3.xls		AirNode_Esowc_GanttChart_Aug2020_Nov2020_Schedule_3.xls
AirNode_Milestone2_PCA.py		AirNode_Milestone2_PCA.py
AirNode_Milestone_AQCA_3.py		AirNode_Milestone_AQCA_3.py
AirNode_RandScore.py		AirNode_RandScore.py
AirNode_RandScore_2.py		AirNode_RandScore_2.py
Chart.js		Chart.js
Detector.js		Detector.js
GMM.py		GMM.py
GMM_2.py		GMM_2.py
GMM_density.py		GMM_density.py
HierarchicalClustering.py		HierarchicalClustering.py
Milestone_1_Presentation_Importing_Dataset.odp		Milestone_1_Presentation_Importing_Dataset.odp
Milestone_1_Presentation_Importing_Dataset.pdf		Milestone_1_Presentation_Importing_Dataset.pdf
MonteCarloKMeans.py		MonteCarloKMeans.py
MonteCarlo_Clustering.py		MonteCarlo_Clustering.py
PyClustering_Examples.py		PyClustering_Examples.py
README.md		README.md
Readme_1.md		Readme_1.md
Readme_Topic.md		Readme_Topic.md
Test1_OpenAQ_Apply_ECMWF_ESoWC_Milestone4_Pecos1.ipynb		Test1_OpenAQ_Apply_ECMWF_ESoWC_Milestone4_Pecos1.ipynb
Test_OpenAQ_Apply_ECMWF_ESoWC_Milestone4_Pecos.ipynb		Test_OpenAQ_Apply_ECMWF_ESoWC_Milestone4_Pecos.ipynb
d3.min.js		d3.min.js
dndTree.js		dndTree.js
flare.json		flare.json
gatherer.js		gatherer.js
id3.js		id3.js
index.html		index.html
jquery-1.11.1.min.js		jquery-1.11.1.min.js
main.js		main.js
plotly-latest.min.js		plotly-latest.min.js
pointcloud.js		pointcloud.js
stats.min.js		stats.min.js
style.css		style.css
surface3d.js		surface3d.js
surfaceplotter.js		surfaceplotter.js
testdata.js		testdata.js
three.min.js		three.min.js
treemap.js		treemap.js
underscore-min.js		underscore-min.js
viridis.json		viridis.json

wegiangb/air-quality-observation-classification

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages