Web-Mining-Practice

Here is a description for each python file:

extract_text_from_rotten_tomatoes.py

 Created a script named extract_text_from_rotten_tomatoes.py

 This script defines a function that accepts the URL to a movie on RottenTomatoes. It then creates a 
 text file that includes the following information for each review in the first 2 review pages for 
 the movie:

 - the name of the critic 

 - the rating. The rating should be 'rotten', 'fresh', or 'NA' if the review doesn't have a rating.

 - the source (e.g 'New York Daily News') of the review. Is 'NA' if the review doesn't have a source.

 - the text of the review. Is 'NA' if the review doesn't have text.

 - the date of the review. Is 'NA' if the review doesn't have a date.

 The file includes one line for each review. The reviews in the file appear in the same 
 order as they do on the website. The 5 values that you write for each movie is written in 
 the order listed above. The 5 values are separated by a TAB.

webcounter.py

Created a script called webcounter.py

- The script defines a function run() with 3 parameters: a link to webpage and two words w1 and w2.

- The function returns a set of all the words in the webpage that have a higher frequency than w1 but a 
  lower frequency than w2.

- Ignored case.

- Removed all non-letter characters before you count

- Ignored stopwords

getngrams.py.

My script defines the following function:

processSentence(sentence,posLex,negLex,tagger):  The parameters of this function are a sentence (a 
string), a set positive words, a set of negative words, and a POS tagger.  The function returns a 
list with all the 4-grams in the sentence that have the following structure:                                                   

not <any word> <pos/neg word> <noun>

For example: not a good idea

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Classifier		Classifier
textminer		textminer
README.md		README.md
extract_text_from_rotten_tomatoes.py		extract_text_from_rotten_tomatoes.py
webcounter.py		webcounter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier

Classifier

textminer

textminer

README.md

README.md

extract_text_from_rotten_tomatoes.py

extract_text_from_rotten_tomatoes.py

webcounter.py

webcounter.py

Repository files navigation

Web-Mining-Practice

About

Releases

Packages

Languages

sonalijohari/Web-Mining-Practice

Folders and files

Latest commit

History

Repository files navigation

Web-Mining-Practice

About

Resources

Stars

Watchers

Forks

Languages