GitHub - arnavn101/WebXplore: Web Scraping and Crawling Tools

WebXplore (v1.0.3)

WebXplore offers multitude of tools for web scraping, crawling and performing computations on scraped information to determine sentiment values or tone of the author.

This package helps in retrieving information from these sources:

Google Search: Get links from any google search query.
Website Text: Use an intelligent parser to strip all the HTML tags from webpage contents.
Twitter: Given a word or phrase, get related tweets.
Reddit: Get the hottest posts given the subreddit and a key phrase.
NewsAPI: Retrieve News Articles given topic or phrase.

Installation

$ pip install webxplore

or clone the repository.

$ git clone https://github.com/arnavn101/WebXplore.git

Getting Started

Here are steps for using webxplore.

1. Get Links from Google Search

from webxplore.web_searcher import SearchWeb

search_query = SearchWeb('Artificial Intelligence', 5)
print(search_query.returnListLinks())

2. Scrape a Website

from webxplore.web_scraper import ScrapeWebsite

scrape_query = ScrapeWebsite('https://en.wikipedia.org/wiki/Artificial_intelligence')
print(scrape_query.return_article())

3. Get Sentiments from Text

from webxplore.utils.sentiment import RetrieveSentiments

sentiment_analyzer = RetrieveSentiments('This is a good situation.')
print(sentiment_analyzer.returnFinalSentiment())

4. Get Summary of the Text

from webxplore.utils.summarizer import SummarizeText

textSummarizer = SummarizeText('He feels very scared. He wants to protect himself.', 1)
print(textSummarizer.returnFinalSummary())

5. Get Tone of the Text (for each sentence)

from webxplore.utils.analyzer import ToneAnalysis

textTone = ToneAnalysis('Laugh and the world laughs with you.' +
                        'Weep and you weep alone.', "watsonApiKey")
print(textTone.returnTone())

6. Use the news api to get the latest articles

from webxplore.search.news import RetrieveNewsArticle

newsArticles = RetrieveNewsArticle('Politics', 5, 'newsApiKey')
print(newsArticles.return_articleSentences())

7. Get Posts from a SubReddit

from webxplore.search.reddit import CrawlSubReddit

redditPosts = CrawlSubReddit('stocks', 'amazon', 10, 'RedditClientId',
                                          'RedditClientSecret', 'RedditUserAgent')
print(redditPosts.return_listSentences())

8. Get Tweets that have a key word

from webxplore.search.twitter import CrawlTwitter

retrieveTweets = CrawlTwitter('tesla', 10, 'TwitterConsumerKey', 'TwitterConsumerSecret',
                                        'TwitterAccountKey', 'TwitterAccountSecret')
print(retrieveTweets.return_tweets())

Contributions

Anyone is welcome to add any contribution to this repository. All good changes are welcome. Please create a pull request and ensure that it passes all the CI tests.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
webxplore		webxplore
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
TODO		TODO
requirements.txt		requirements.txt
setup.py		setup.py
tone.json		tone.json

License

arnavn101/WebXplore

Folders and files

Latest commit

History

Repository files navigation

WebXplore (v1.0.3)

Installation

Getting Started

1. Get Links from Google Search

2. Scrape a Website

3. Get Sentiments from Text

4. Get Summary of the Text

5. Get Tone of the Text (for each sentence)

6. Use the news api to get the latest articles

7. Get Posts from a SubReddit

8. Get Tweets that have a key word

Contributions

License

About

Resources

License

Stars

Watchers

Forks

Languages