Skip to content

jgabriellima/portuguese-nlp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

portuguese-nlp

Nlp work on Brazil Portuguese newswire text

You can browse the dataset online and see annotations on drive

We have x number of newswire articles collected between years 1994-2016. After preprocessing the dataset, since the articles are in html format, we first clean the tags and rename all files such as:

folca/data/2005/01/01/19.html --> folca/parsed-data/2005_01_01_19.html

and collect them all in one folder.

More

About

Nlp work on Brazil Portuguese newswire text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.4%
  • Shell 6.6%