Skip to content

Tann-chen/classification-resident-request

 
 

Repository files navigation

classificationOfResidentialRequests

Implementation of Hybrid Machine Learning Models of Classifying Residential Requests in Smart Cities Paper Our Paper

Local Environment Specification

Training and testing ran on a machine with:

  • Ubuntu 16.04 LTS
  • Nvidia GeForce GTX 1070
  • CUDA version: 9.0
  • Cudnn version: 7.3.0
  • Python version: 3.5.2
  • Tensorflow-gpu: 1.11.0
  • Keras: 2.2.4

Introduction

This implementation includes all the tasks that was described in the paper, including feature engineering, hybrid machine learning, different classifiers, convolution neural network models, etc. We split the implementation in to four parts:

  • Bayesian model
  • Neural network model
  • Feature engineering

Feature Engineering

Feature engineering processes and transforms the data set in Chinese texts to word vectors as inputs of machine learning models.

  • Data Preprocess
    • Segmented into tokens
    • Remove punctuation, stopwords, etc.
  • Lexical Analysis (request, category, responsible department description)
    • Data Distribution
    • Information Values of Features
  • Word Embedding and Vectorization
    • Word embedding using Word2Vec
    • Word vector using TF-IDF

Hierarchical Classification

We develop a hierarchical classification method to handle classification.

  • K-Means and GMM Clustering
  • OPTICS, LDA and Entropy Calculation

Hybrid Machine Learning Models

  • Bayesian classifier
  • Hierarchical Bayesian classifier
  • Fully-connected NN classifier
  • Hierarchical fully-connected NN classifier
  • Residual convolutional NN classifier

Performance on blind test set

Models Metrics
Accuracy Precision Recall
Micro Macro Micro Macro
Hierarchical Fully Connected NN 0.6495 0.650 0.244 0.650
0.192
Fully Connected NN 0.6889 0.689
0.259
0.689
0.214
Hierarchical Naive Bayesian 0.6776 0.678
0.251
0.678
0.201
Naive Bayesian 0.7258
0.726
0.295
0.726
0.256
Residual Network 0.7642 0.764
0.417
0.764
0.352

About

source code of article: Hybrid Machine Learning Models of Classifying Residential Requests in Smart Cities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 65.2%
  • Jupyter Notebook 34.8%