AML4DHGermanVecs

This repository contains code to train and evaluate German medical embeddings. Example code and usage examples are provided in Jupyter notebooks (1 Vectorization, 2 Evaluation).

Setup:

Define relevant paths in config.json (have a look at the default file).

Extensions

To extend the processing pipeline do the following:

Adding further preprocessing methods

Extend vectorization/embeddings.py->Embeddungs.sentence_data2vec() before calling Embeddings.calculate_vectors().

Adding further vectorization algorithms

Extend vectorization/embeddings.py->Embeddings.calculate_vectors() by new embeddings algorithm which accepts sentences as input.

Adding further resources

Add new ressources to resource/other_resources.py or resource/UMLS.py by inheriting the abstract class Evaluator and implementing its abstract methods.

Adding further benchmarks

Add new benchmarks to benchmarking/benchmarks by inheriting the abstract class Benchmark and implementing its abstract methods. Use the constructor to define relevant resource files such as knowledge bases, which will be passed by evaluate_embeddings.py or other running instances as list.

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
benchmarking		benchmarking
data		data
resource		resource
utils		utils
vectorization		vectorization
.gitignore		.gitignore
1 Vectorization.ipynb		1 Vectorization.ipynb
2 Evaluation.ipynb		2 Evaluation.ipynb
3 Tables and Plots.ipynb		3 Tables and Plots.ipynb
README.md		README.md
__init__.py		__init__.py
config.json		config.json
corpus_stats.py		corpus_stats.py
default.config.json		default.config.json
evaluate_embeddings.py		evaluate_embeddings.py
flair_embeddings.py		flair_embeddings.py
requirements.txt		requirements.txt
vectorize_data.py		vectorize_data.py

LasseKohlmeyer/AML4DHGermanVecs

Folders and files

Latest commit

History

Repository files navigation

AML4DHGermanVecs

Setup:

Extensions

Adding further preprocessing methods

Adding further vectorization algorithms

Adding further resources

Adding further benchmarks

About

Resources

Stars

Watchers

Forks

Languages