Genetic Algorithm on K-Means Clustering of PHD Programs with BM-25 Score

This is a fork from Genetic-Algorithm-on-K-Means-Clustering and was developed as a task for the Evolutive Systems class of 2019 for the Computer Science Masters Course at UFPel.

The resulting paper is available here (in portuguese).

Approach

PHD program clustering using Genetic K-Means and BM25 scores

Corpus of each PHD program made of the title and resume of all their indexed production at the official brazilian plataform Sucupira -- articles, books, presentations, thesis, and etc, with resume being available mostly for dissertation and thesis only
Using ElasticSearch and the generated dataset, extracted the top 50 scoring keywords for each program with Okapi BM-25
For each unique term (around 11k), gets the score of that term for each program, generating a matrix
Minmax normalization for standardization
Davies–Bouldin index for evaluation of each cluster
In Genetic
- Rank based selection
- One point crossover

Requirements

panda
numpy

Getting Started

python __main__.py

Input

config.txt contain control parameters
- kmax : maximum number of clusters
- budget : budget of how many times run GA
- numOInd : number of Individual
- Ps : probability of ranking Selection
- Pc : probability of crossover
- Pm : probability of mutation

Output

norm_data.csv is normalization data
cluster_json is centroid of each cluster
result.csv is data with labeled to each cluster

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Paper.pdf		Paper.pdf
README.md		README.md
__main__.py		__main__.py
chromosome.py		chromosome.py
cluster.py		cluster.py
config.txt		config.txt
generation.py		generation.py
genetic.py		genetic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

.gitignore

.gitignore

LICENSE.md

LICENSE.md

Paper.pdf

Paper.pdf

README.md

README.md

main.py

main.py

chromosome.py

chromosome.py

cluster.py

cluster.py

config.txt

config.txt

generation.py

generation.py

genetic.py

genetic.py

Repository files navigation

Genetic Algorithm on K-Means Clustering of PHD Programs with BM-25 Score

Approach

Requirements

Getting Started

Input

Output

About

Releases

Packages

Languages

License

Naraujo13/phd_clustering

Folders and files

Latest commit

History

Repository files navigation

Genetic Algorithm on K-Means Clustering of PHD Programs with BM-25 Score

Approach

Requirements

Getting Started

Input

Output

About

Resources

License

Stars

Watchers

Forks

Languages