GitHub - camilothorne/nasslli2016: Code for my NASSLLI 2016 tutorial on corpus analysis with open source tools

NASSLLI 2016 Tutorial

This site contains part of the materials we will be using during the tutorial on Corpus Statistics with Open Source Tools at NASSLLI 2016. The tutorial will be interactive. Basic analytical concepts and techniques will be exemplified on the datasets listed below. It presupposes that you come with a laptop and that you install a Git versioning client.

REMARK: This site contains already a case study in corpus analysis that we will discuss 
together. At the end of the tutorial, the notes, slides and some extra sample code will 
be uploaded to this repository.

The course will rely on two pillars: (1) the R statistical analysis enviroment and (2) the Python scripting language. A companion tool for R is the RStudio IDE. For Python, you can use the IDE of your choice (e.g., Eclipse with the PyDev plugin). I will help on how to install and set up most of the required tools/resources during the tutorial, albeit for Linux environments. Below, I list the main requirements and references. Additional (but minor) libraries and references will be mentioned as we go.

A. Software:

R 2.0+, with libraries:
- languageR (English datasets)
- infotheo (Shannon entropy)
- xlsx (to write/read .xls and .csv files)
RStudio 0.99+ (IDE for R)
Python 2.7+, with libraries:
- NumPy 1.0+ (numerical computation)
- Matplotlib 1.0+ (plotting)
- SciPy 1.0+ (basic statistics)
- NLTK 2.0+ (NLP)
- Gensim (word embeddings)
Word2Vec models

B. References:

Peter Dalgaard. Introductory Statistics with R. Springer, 2009.
Stefan T. Gries. "Useful statistics for corpus linguistics". In: A Mosaic of Corpus Linguistics: Selected Approaches, p. 269-291. Peter Lang, 2010.
Chris Manning and Hinrich Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
Steven Bird, Ewan Klein and Edward Loper. Natural Language Processing with Python. O'Reilly, 2009.
R. H. Baayen. Analyzing Linguistic Data. A Practical Introduction to Statistics.. Cambrige University Press, 2008.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Nasslli16		Nasslli16
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nasslli16

Nasslli16

LICENSE

LICENSE

README.md

README.md

Repository files navigation

NASSLLI 2016 Tutorial

A. Software:

B. References:

About

Releases

Packages

Languages

License

camilothorne/nasslli2016

Folders and files

Latest commit

History

Repository files navigation

NASSLLI 2016 Tutorial

A. Software:

B. References:

About

Resources

License

Stars

Watchers

Forks

Languages