Skip to content

purvasingh96/Natural-Language-Processing

Repository files navigation

Natural-Language-Processing

1. Downloading NLTK

  1. pip installation
pip install nltk
  1. Downloading nltk's components
>>import nltk
>>nltk.download('all')

Terminologies related to NLP

let example = "Hello Miss. Purva Singh! How are you? We are very excited to meet you !!!!"
let example_words = ("draw", "drawing", "drew", "draws")

S.No. Terminology Description Python Library Examples
1. Tokenizing Tokenizing can be considered as a form of grouping a charecter sequence. They are of 2 types - 1. Sentence Tokenizer 2. Word Tokenizer sent_tokenize(example) word_tokenize(example) SENTENCE TOKENIZER -
Hello Miss.
Purva Singh!
How are you?
We are very excited to meet you !!!!
WORD TOKENIZER - 'Hello','Miss','.','Purva','Singh','!','How','are','you','?','We','are', 'very','excited','to','meet','you','!','!','!','!'
2. Corpora Corpora refers to large collection of texts import nltk.corpus medical journals, presidential speech, any English language
3. Lexicon Lexicon refers to dictionary of words and their meanings bull - To a financial investor, the first meaning for the word "Bull" is someone who
is confident about the market
bull - also an animal
4 Stop Words Stop words refers to those set of extra words in the sentence that we donot need. They are filler words and w.r.t data analysis, they are useless from nltk.corpus import stopwords
set(stopwords.stop("english"))
'Hello', 'Miss', '.', 'Purva', 'Singh', '!', 'How', '?', 'We', 'excited', 'meet', '!', '!', '!', '!'
5 Stemming Sometimes words might have variations, due to their tenses. Stemming would normalize the sentences from nltk.stem import PorterStemmer
ps = PorterStemmer()
Stemming would give a set of root words.

ps.stem(example_words) = ("draw", "draw", "drew", "draw")
6 Tagging Part of speech tagging refers to labeling words in a sentence as nouns, adjectives, verbs, tenses etc. part_of_speech_tag = nltk.pos_tag(tokenized_words) (('PRESIDENT', 'NNP'),
('members', 'NNS'),
('W.', 'NNP'),
('THE', 'DT'))

1. NNP - proper noun
2. DT - determiner
3. NNS - noun plural
7 Chunking Chunking can be referred as grouping of words based upon a regular expression. chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}""" Chunk PRESIDENT/NNP GEORGE/NNP W./NNP BUSH/NNP)
(Chunk ADDRESS/NNP)
(Chunk A/NNP JOINT/NNP SESSION/NNP)
8 Chinking Chinking can be referred as exclusion of words, represented by outward curly braces - }(Chinking RegExp){. chunkGram = r"""Chunk: {<.*>+}<VB.?>*<NNP>+<NN>?{""" (Chunk THE/NNP
CONGRESS/NNP
ON/NNP
THE/NNP
STATE/NNP)
9. Named entity recognition Main idea behind Named entity recognition is to chunk "entities" such as people, places, things, locations, monetary figures, and more named_entity = nltk.ne_chunk(tagged) ORGANIZATION - Caplan and Gold,
WHO PERSON - Purva Singh
LOCATION - Bhilai, Bangalore
DATE - June, 2019-06-29
TIME - two fifty a m, 1:30 p.m.
10 Lemmatizing Lemmatizing is similar to stemming, but in former, every word generated is an actual word unlike stemming. Lemmatizer function takes an optional parameter "pos"(part of speech) which by default is noun. from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize("pretty")
# pretty
print(lemmatizer.lemmatize("pretty"))

# drawing
print(lemmatizer.lemmatize("drawing", pos='a'))

# good
print(lemmatizer.lemmatize("better", pos='a'))

# better
print(lemmatizer.lemmatize("better"))

Sentiment Analysis

Pre-requisites for performing sentiment analysis using Twitter API :

  1. Create a twitter developer account.
  2. Create an app by filling all the required details.
  3. Sometimes email confirmation mail can come in your spam folder.
  4. After creating app, under keys and tokens section, you can find your respective -- consumer key, consumer secret key, token key and token secret key

Twitter's Stream API giving 401

  1. One of the reasons for Stream API giving 401 is :: Twitter account's time zone and ubuntu machine's timezone are not in SYNC
  2. To check current time zone in ubuntu, type date command :
  1. To check time zone of your twitter account, follow the following steps -
  • Goto twitter
  • Click on your profile -> settings and privacy -> Timezone
  • Set timezone in sync with your ubuntu machine's timezone.