Skip to content

MichelsonChan/wiki_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 

Repository files navigation

wiki_learning

Background:

Text learning is to classify a sea of texts into certain categories. This project, namely wiki_learning, self-states the purpose that Wikipedia texts are aimed to be clustered into some group such that the texts within a single group has their content mutually related.

How To Do:

For each file, all punctuations are removed and the remaining words are sorted. The frequency of occurance of the existing words in each file are calculated and is presented as a column vector. When these column vectors, each representing the word profile of a text file, are gathered together form a big matrix, Non-Negative Matrix Factorization (NMF) can be employed to help classify the texts.

Mathematical model:

Y = A * S , with model order takens as K

Y (MxN):= big matrix with cN olumn vectors representing the word profile of N files

A (MxK):= 'pure text' matrix, where each column vector represents the word profile of a 'pure text' corresponding to the column subject. There are M words exist in the entire considered text assembly.

S (KxN):= abundance (proportion) of the 'pure text' that contributes to those N files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published