Text Categorizer - experiments

Text Categorizer is a tool available in https://github.com/LuisVilarBarbosa/TextCategorizer/ that implements a configurable pipeline of methods used to train models that predict the categories of textual data.

This repository contains side-projects that use a minimal version of the code necessary to categorize text to test different tools that could be added to Text Categorizer.

Getting Started

These instructions will get you a copy of the projects up and running on your local machine.

The different projects are designed to be used natively, but can easily be used with Docker.

Prerequisites

To execute natively, a machine with Anaconda3 64-bit or Miniconda3 64-bit installed is required.
To execute the experiment "2020-03-09_02_ELMo_by_sentence" natively on Linux, it is also required to have g++ and make installed.

Installing/Updating

Here are presented the instructions on how to install/update all the dependencies necessary to execute the projects.

To install natively, open a shell (an Anaconda prompt is recommended on Windows and Bash is recommended on Linux) and type the following commands:

cd <path-to-experiment-folder>
conda env create --file environment.yml

To update natively, open a shell (an Anaconda prompt is recommended on Windows and Bash is recommended on Linux) and type the following commands:

cd <path-to-experiment-folder>
conda env create --file environment.yml --force

Executing

Here are presented the instructions on how to execute the projects.

To execute natively, open a shell (an Anaconda prompt is recommended on Windows and Bash is recommended on Linux) and type the following commands:

cd <path-to-experiment-folder>
conda activate text-categorizer
python <Python-file>

Calling python <Python-file> will present the usage parameters that must be indicated for the execution of the code.

These parameters are relatively simple to understand, but an overview of the code is recommended to understand the behavior of each parameter.

Authors

Luís Barbosa - LuisVilarBarbosa

Acknowledgments

The layout of this README was inspired on https://github.com/LuisVilarBarbosa/TextCategorizer/blob/5bad65078999edde5312915c9654b2f5d910c288/README.md.

Development Notes

In general, the projects have been tested on Windows 10 and Ubuntu, but some projects might only work on Linux.
Pickle is used to dump and load data to and from files. This protocol is the fastest of the tested protocols, but is considered insecure. Please take this information into consideration.
Some projects might assume that an Internet connection is available.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
2020-01-21_01_BertTokenizer		2020-01-21_01_BertTokenizer
2020-01-21_02_XLMTokenizer		2020-01-21_02_XLMTokenizer
2020-01-22_01_MosesTokenizer		2020-01-22_01_MosesTokenizer
2020-01-22_02_Flair_segtok		2020-01-22_02_Flair_segtok
2020-01-24_01_spaCy		2020-01-24_01_spaCy
2020-01-30_01_TreeTagger		2020-01-30_01_TreeTagger
2020-01-30_02_CitiusTools		2020-01-30_02_CitiusTools
2020-03-09_01_BERT_by_sentence		2020-03-09_01_BERT_by_sentence
2020-03-09_02_ELMo_by_sentence		2020-03-09_02_ELMo_by_sentence
2020-03-09_03_BERT_and_ELMo		2020-03-09_03_BERT_and_ELMo
2020-04-22_01_Stanza		2020-04-22_01_Stanza
2020-06-25_01_BERTNeuralNet_classifier		2020-06-25_01_BERTNeuralNet_classifier
2020-06-25_02_BERTNeuralNet_embeddings		2020-06-25_02_BERTNeuralNet_embeddings
.gitignore		.gitignore
README.md		README.md

LuisVilarBarbosa/TextCategorizer-experiments

Folders and files

Latest commit

History

Repository files navigation

Text Categorizer - experiments

Getting Started

Prerequisites

Installing/Updating

Executing

Authors

Acknowledgments

Development Notes

About

Resources

Stars

Watchers

Forks

Languages