wiki_learning

Background:

Text learning is to classify a sea of texts into certain categories. This project, namely wiki_learning, self-states the purpose that Wikipedia texts are aimed to be clustered into some group such that the texts within a single group has their content mutually related.

How To Do:

For each file, all punctuations are removed and the remaining words are sorted. The frequency of occurance of the existing words in each file are calculated and is presented as a column vector. When these column vectors, each representing the word profile of a text file, are gathered together form a big matrix, Non-Negative Matrix Factorization (NMF) can be employed to help classify the texts.

Mathematical model:

Y = A * S , with model order takens as K

Y (MxN):= big matrix with cN olumn vectors representing the word profile of N files

A (MxK):= 'pure text' matrix, where each column vector represents the word profile of a 'pure text' corresponding to the column subject. There are M words exist in the entire considered text assembly.

S (KxN):= abundance (proportion) of the 'pure text' that contributes to those N files

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dump.source		dump.source
text.source		text.source
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dump.source

dump.source

text.source

text.source

README.md

README.md

Repository files navigation

wiki_learning

About

Releases

Packages

Languages

MichelsonChan/wiki_learning

Folders and files

Latest commit

History

Repository files navigation

wiki_learning

About

Resources

Stars

Watchers

Forks

Languages