Skip to content

CantOkan/BIL460_Data_Mining

Repository files navigation

BIL460_Data_Mining Projects

Web Page Classification

Contributors:

Our working space

• 7 Classes

  • Basic materials sector
  • Energy sector
  • Financial sector
  • Healthcare sector
  • Technology sector
  • Transportation sector
  • Utilities sector

image

• 4581 individual html files

Future represantation

  • TF IDF
  • Count Vectorizer + TF IDF
  • N Gram
    • BiGram BiGram

Features Selection and Feature Extraction

  • Feature Selection

    • Filter Methods
      • Correlation
      • Information Gain
      • Relief
    • Wrapper Methods
      • Sequential Feature Selection
  • Feature Reduction

    • Principal Component Analysis(PCA)
    • Linear Discriminant Analysis (LDA)
    • Latent Semantic Analysis (LSA)

Correlation Matrix

image

Other images from presentaion :

elbow

K-MEANS