Python BeautifulSoup.filter_wikilinks Examples

Programming Language: Python

Namespace/Package Name: bs4

Class/Type: BeautifulSoup

Method/Function: filter_wikilinks

Examples at hotexamples.com: 1

Python BeautifulSoup.filter_wikilinks - 1 examples found. These are the top rated real world Python examples of bs4.BeautifulSoup.filter_wikilinks extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

append(30)

BeautifulSoup(30)

__str__(30)

__init__(11)

attrs(10)

__len__(8)

__repr__(3)

__unicode__(2)

article(2)

__copy__(2)

__getattr__(2)

first(2)

findAllNext(2)

feed(1)

currentTag(1)

fartind(1)

BF(1)

filter_wikilinks(1)

fina_all(1)

fnd_all(1)

h1(1)

replace_with(1)

td(1)

toCSV(1)

copy(1)

alcohol(1)

astype(1)

assign(1)

apply(1)

add_structure(1)

add_shared_term(1)

a(1)

_title(1)

_repr_html_(1)

_find_all(1)

_all_strings(1)

__getitem__(1)

__contains__(1)

NavigableString(1)

Date(1)

wrap(1)

Example #1

Show file

File: wiki.py Project: ajmendez/timeline

 def get_page(self, title, cache=True, beautiful=False, **kwargs):
   '''Get the wikitext code of a wikipedia page with a title.
   cache [True] -- Cache the page so that we are not abusing wikipedia
   beautiful [False] -- Return the BeautifulSoup html code -- for tables
   **kwargs -- updates the requests params
   
   Example URL:
   http://en.wikipedia.org/w/api.php?
   action=parse&prop=wikitext&page=List_of_Nobel_laureates_in_Physics&
   format=json&section=1
   
   '''
   # Load from the cache
   if title in self.cache and cache:
     if beautiful:
       return BeautifulSoup(self.cache[title])
     else:
       return self.cache[title]
       
   # If using BeautifulSoup get the html rather than the wikitext
   if beautiful:
     kwargs['prop'] = 'text'
   
   # url parameters for the wikipedia api
   params = dict(
     action='parse',
     prop='wikitext',
     page=title.encode('utf-8'),
     format='json',
   )
   params.update(kwargs)
   
   request = self.session.get(URL, params=params)
   print 'Loaded: {}'.format(request.url)
   page = json.loads(request.content)['parse'][params['prop']]['*']
   
   # handle redirects nicely
   if '#REDIRECT' in page:
     if beautiful:
       code = BeautifulSoup(page)
       newtitle = code.find('span',{'class','redirectText'}).find('a').get('title')
     else:
       code = mwparserfromhell.parse(page)
       newtitle = u'{}'.format(code.filter_wikilinks()[0].title)
     kwargs.update(dict(cache=cache, beautiful=beautiful))
     return self.get_page(newtitle, **kwargs)
 
   # Save to the cache for the future
   if cache:
     self.cache[title] = page
 
   # Output the wikitext string or make it beautiful
   if beautiful:
     return BeautifulSoup(page)
   else:
     return page