CS-TextNormalization

We build a pipeline to clean text noisy code-switched text online.

Getting the repo

git clone --recursive https://github.com/sumeet-iitg/CS-TextNormalization.git

-- Don't miss the 'recursive' part for pulling required sub-modules

Components of the Normalization Pipeline

DataManagement: This folder contains the various abstractions that make up the pipeline. When you add a new implementation of some tool for the pipeline, make sure that it is always along the lines of an abstraction contained in this folder. Feel free to add new abstractions into this folder. Some of the abstractions are as follows:
languageUtils.py: Classes for Langauge Specific Identifiers, Lexicons and SpellCheckers.
dataloader.py: Classes for loading a corpus - mono-lingual/multi-lingual.

Requirements

Usage

You can use this pipeline end to end, or run the individual components within

python main.py "source_tanglish.txt" "english,telugu"

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
DataManagement		DataManagement
Equilid		Equilid
charSpeller		charSpeller
cm_spellchecker @ 49f8ee2		cm_spellchecker @ 49f8ee2
contextSpeller		contextSpeller
datasets		datasets
utils		utils
.gitmodules		.gitmodules
ProjectOutline.PNG		ProjectOutline.PNG
README.md		README.md
main.py		main.py
run_scripts.py		run_scripts.py
source_tanglish.txt		source_tanglish.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataManagement

DataManagement

Equilid

Equilid

charSpeller

charSpeller

cm_spellchecker @ 49f8ee2

cm_spellchecker @ 49f8ee2

contextSpeller

contextSpeller

datasets

datasets

utils

utils

.gitmodules

.gitmodules

ProjectOutline.PNG

ProjectOutline.PNG

README.md

README.md

main.py

main.py

run_scripts.py

run_scripts.py

source_tanglish.txt

source_tanglish.txt

Repository files navigation

CS-TextNormalization

Getting the repo

Components of the Normalization Pipeline

Requirements

Usage

About

Releases

Packages

Contributors 2

Languages

sumeet-iitg/CS-TextNormalization

Folders and files

Latest commit

History

Repository files navigation

CS-TextNormalization

Getting the repo

Components of the Normalization Pipeline

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Languages