GitHub - naiaden/VacCor: The title is not directed to you, so take no offence

Processing the vacancies: The vacancies are stored in one XML file. Since processing this takes a lot of time, we want to do it distributedly. This script efficiently walks through the XML file, and separates it into individual vacancies, which are then passed on to multiple threads.

Processing the Twente News Corpus: Since ucto is a memory hogger for large input files, we tokenise the individual files. These are then concatenated into one large file. Kind of like the reverse process for the vacancies :-) This is because the background corpus is aggregated into type counts, ignoring any other information.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
TwNC.illegal_files		TwNC.illegal_files
pronew.py		pronew.py
provac.py		provac.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TwNC.illegal_files

TwNC.illegal_files

pronew.py

pronew.py

provac.py

provac.py

Repository files navigation

About

Releases

Packages

Languages

naiaden/VacCor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages