Skip to content

yuta1125tp/natural-language-preprocessings

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Pre-processing

This repository includes some recipes of natural language pre-processing.

The list of recipes are as follows:

  • Data cleaner
  • Word normalization
  • Stopwords remover
  • Tokenizer
  • Word Vector

Install

To install required modules, simply:

$ pip install -r requirements.txt

Setup

First, you should download livedoor news corpus and extract it. For downloading the corpus, please execute following command:

$ cd src/data
$ python make_dataset.py

Now, you can ready for classification!

Start jupyter notebook:

$ jupyter notebook

And you can execute notebooks/document_classification.ipynb.

Good NLP Life!

Licence

MIT

Author

Hironsan

About

Some recipes of natural language pre-processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.6%
  • Jupyter Notebook 32.7%
  • HTML 3.7%