Implementation of Hybrid Machine Learning Models of Classifying Residential Requests in Smart Cities Paper Our Paper
Training and testing ran on a machine with:
- Ubuntu 16.04 LTS
- Nvidia GeForce GTX 1070
- CUDA version: 9.0
- Cudnn version: 7.3.0
- Python version: 3.5.2
- Tensorflow-gpu: 1.11.0
- Keras: 2.2.4
This implementation includes all the tasks that was described in the paper, including feature engineering, hybrid machine learning, different classifiers, convolution neural network models, etc. We split the implementation in to four parts:
- Bayesian model
- Neural network model
- Feature engineering
Feature engineering processes and transforms the data set in Chinese texts to word vectors as inputs of machine learning models.
- Data Preprocess
- Segmented into tokens
- Remove punctuation, stopwords, etc.
- Lexical Analysis (request, category, responsible department description)
- Data Distribution
- Information Values of Features
- Word Embedding and Vectorization
- Word embedding using Word2Vec
- Word vector using TF-IDF
We develop a hierarchical classification method to handle classification.
- K-Means and GMM Clustering
- OPTICS, LDA and Entropy Calculation
- Bayesian classifier
- Hierarchical Bayesian classifier
- Fully-connected NN classifier
- Hierarchical fully-connected NN classifier
- Residual convolutional NN classifier
Models | Metrics |
||||
Accuracy | Precision | Recall | |||
Micro | Macro | Micro | Macro | ||
Hierarchical Fully Connected NN | 0.6495 | 0.650 | 0.244 | 0.650 |
0.192 |
Fully Connected NN | 0.6889 | 0.689 |
0.259 |
0.689 |
0.214 |
Hierarchical Naive Bayesian | 0.6776 | 0.678 |
0.251 |
0.678 |
0.201 |
Naive Bayesian | 0.7258 |
0.726 |
0.295 |
0.726 |
0.256 |
Residual Network | 0.7642 | 0.764 |
0.417 |
0.764 |
0.352 |