Natural-Language-Processing

1. Downloading NLTK

pip installation

pip install nltk

Downloading nltk's components

>>import nltk
>>nltk.download('all')

Terminologies related to NLP

let example = "Hello Miss. Purva Singh! How are you? We are very excited to meet you !!!!"
let example_words = ("draw", "drawing", "drew", "draws")

S.No.	Terminology	Description	Python Library	Examples
1.	Tokenizing	Tokenizing can be considered as a form of grouping a charecter sequence. They are of 2 types - 1. Sentence Tokenizer 2. Word Tokenizer	`sent_tokenize(example) word_tokenize(example)`	SENTENCE TOKENIZER - Hello Miss. Purva Singh! How are you? We are very excited to meet you !!!! WORD TOKENIZER - 'Hello','Miss','.','Purva','Singh','!','How','are','you','?','We','are', 'very','excited','to','meet','you','!','!','!','!'
2.	Corpora	Corpora refers to large collection of texts	`import nltk.corpus`	medical journals, presidential speech, any English language
3.	Lexicon	Lexicon refers to dictionary of words and their meanings		bull - To a financial investor, the first meaning for the word "Bull" is someone who is confident about the market bull - also an animal
4	Stop Words	Stop words refers to those set of extra words in the sentence that we donot need. They are filler words and w.r.t data analysis, they are useless	`from nltk.corpus import stopwords` `set(stopwords.stop("english"))`	'Hello', 'Miss', '.', 'Purva', 'Singh', '!', 'How', '?', 'We', 'excited', 'meet', '!', '!', '!', '!'
5	Stemming	Sometimes words might have variations, due to their tenses. Stemming would normalize the sentences	`from nltk.stem import PorterStemmer` `ps = PorterStemmer()`	Stemming would give a set of root words. ps.stem(example_words) = ("draw", "draw", "drew", "draw")
6	Tagging	Part of speech tagging refers to labeling words in a sentence as nouns, adjectives, verbs, tenses etc.	`part_of_speech_tag = nltk.pos_tag(tokenized_words)`	(('PRESIDENT', 'NNP'), ('members', 'NNS'), ('W.', 'NNP'), ('THE', 'DT')) 1. NNP - proper noun 2. DT - determiner 3. NNS - noun plural
7	Chunking	Chunking can be referred as grouping of words based upon a regular expression.	`chunkGram = r"""Chunk: {<RB.?><VB.?><NNP>+<NN>?}"""`	Chunk PRESIDENT/NNP GEORGE/NNP W./NNP BUSH/NNP) (Chunk ADDRESS/NNP) (Chunk A/NNP JOINT/NNP SESSION/NNP)
8	Chinking	Chinking can be referred as exclusion of words, represented by outward curly braces - }(Chinking RegExp){.	`chunkGram = r"""Chunk: {<.>+}<VB.?><NNP>+<NN>?{"""`	(Chunk THE/NNP CONGRESS/NNP ON/NNP THE/NNP STATE/NNP)
9.	Named entity recognition	Main idea behind Named entity recognition is to chunk "entities" such as people, places, things, locations, monetary figures, and more	`named_entity = nltk.ne_chunk(tagged)`	ORGANIZATION - Caplan and Gold, WHO PERSON - Purva Singh LOCATION - Bhilai, Bangalore DATE - June, 2019-06-29 TIME - two fifty a m, 1:30 p.m.
10	Lemmatizing	Lemmatizing is similar to stemming, but in former, every word generated is an actual word unlike stemming. Lemmatizer function takes an optional parameter "pos"(part of speech) which by default is noun.	`from nltk.stem import WordNetLemmatizer` `lemmatizer = WordNetLemmatizer()` `lemmatizer.lemmatize("pretty")`	# pretty print(lemmatizer.lemmatize("pretty")) # drawing print(lemmatizer.lemmatize("drawing", pos='a')) # good print(lemmatizer.lemmatize("better", pos='a')) # better print(lemmatizer.lemmatize("better"))

Sentiment Analysis

Pre-requisites for performing sentiment analysis using Twitter API :

Create a twitter developer account.

Create an app by filling all the required details.

Sometimes email confirmation mail can come in your spam folder.

After creating app, under keys and tokens section, you can find your respective -- consumer key, consumer secret key, token key and token secret key

Twitter's Stream API giving 401

One of the reasons for Stream API giving 401 is :: Twitter account's time zone and ubuntu machine's timezone are not in SYNC

To check current time zone in ubuntu, type date command :

To check time zone of your twitter account, follow the following steps -

Goto twitter

Click on your profile -> settings and privacy -> Timezone

Set timezone in sync with your ubuntu machine's timezone.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.idea		.idea
images		images
sentiment-analysis		sentiment-analysis
1. Tokenizing.py		1. Tokenizing.py
10. Text Classification.py		10. Text Classification.py
2. Stop words.py		2. Stop words.py
3. Stemming.py		3. Stemming.py
4. Tagging.py		4. Tagging.py
5. Chunking.py		5. Chunking.py
6. Named Entity Recognition.py		6. Named Entity Recognition.py
7. Lemmatizing.py		7. Lemmatizing.py
8. Corpora.py		8. Corpora.py
9. Word-Net.py		9. Word-Net.py
README.md		README.md
_config.yml		_config.yml
naivebayes.pickle		naivebayes.pickle

purvasingh96/Natural-Language-Processing

Folders and files

Latest commit

History

Repository files navigation

Natural-Language-Processing

1. Downloading NLTK

Terminologies related to NLP

Sentiment Analysis

Pre-requisites for performing sentiment analysis using Twitter API :

Twitter's Stream API giving 401

About

Topics

Resources

Stars

Watchers

Forks

Languages