Skip to content

A few short programs that perform web mining operations.

Notifications You must be signed in to change notification settings

sonalijohari/Web-Mining-Practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web-Mining-Practice

Here is a description for each python file:

  1. extract_text_from_rotten_tomatoes.py
 Created a script named extract_text_from_rotten_tomatoes.py

 This script defines a function that accepts the URL to a movie on RottenTomatoes. It then creates a 
 text file that includes the following information for each review in the first 2 review pages for 
 the movie:

 - the name of the critic 

 - the rating. The rating should be 'rotten', 'fresh', or 'NA' if the review doesn't have a rating.

 - the source (e.g 'New York Daily News') of the review. Is 'NA' if the review doesn't have a source.

 - the text of the review. Is 'NA' if the review doesn't have text.

 - the date of the review. Is 'NA' if the review doesn't have a date.

 The file includes one line for each review. The reviews in the file appear in the same 
 order as they do on the website. The 5 values that you write for each movie is written in 
 the order listed above. The 5 values are separated by a TAB.
  1. webcounter.py
Created a script called webcounter.py

- The script defines a function run() with 3 parameters: a link to webpage and two words w1 and w2.

- The function returns a set of all the words in the webpage that have a higher frequency than w1 but a 
  lower frequency than w2.

- Ignored case.

- Removed all non-letter characters before you count

- Ignored stopwords
  1. getngrams.py.
My script defines the following function:

processSentence(sentence,posLex,negLex,tagger):  The parameters of this function are a sentence (a 
string), a set positive words, a set of negative words, and a POS tagger.  The function returns a 
list with all the 4-grams in the sentence that have the following structure:                                                   

not <any word> <pos/neg word> <noun>

For example: not a good idea

About

A few short programs that perform web mining operations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages