Python Document Exemples

Langage de programmation: Python

Espace de nommage/Pack: readability_lxml.readability

Class/Type: Document

Exemples au hotexamples.com: 5

Python Document - 5 exemples trouvés. Ce sont les exemples réels les mieux notés de readability_lxml.readability.Document extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

summary(3)

summary_with_metadata(2)

Méthodes fréquemment utilisées

summary (3)

summary_with_metadata (2)

Associées

feed_image_restrict_size

FiniteField

initDB

get_log_manager

register

getAvgLL

from_buffer

front_page_view

noway

find_xilinx_path

Related in langs

Z_Model_Resources (PHP)

copy_area (PHP)

TipNamestaja (C#)

EscozUrlStreamDelegate (C#)

_profile_md_stop (C++)

cvGetRows (C++)

CreateDirEntry (Go)

NewBufferPool (Go)

LoggerHelper (Java)

List (Java)

Exemple #1

0

Afficher le fichier

Fichier : test_readability.py Projet : ZoeyYoung/python-readability

def test_basic(self): html = load_regression_data('basic-multi-page.html') urldict = self._make_basic_urldict() fetcher = urlfetch.MockUrlFetch(urldict) options = { 'url': 'http://basic.com/article.html', 'multipage': True, 'urlfetch': fetcher } doc = Document(html, **options) res = doc.summary_with_metadata() self.assertIn('Page 2', res.html, 'Should find the page 2 heading') self.assertIn('Page 3', res.html, 'Should find the page 3 heading') expected_html = load_regression_data('basic-multi-page-expected.html') diff_html = htmldiff(expected_html, res.html) diff_doc = document_fromstring(diff_html) insertions = diff_doc.xpath('//ins') deletions = diff_doc.xpath('//del') if len(insertions) != 0: for i in insertions: print('unexpected insertion: %s' % i.xpath('string()')) self.fail('readability result does not match expected') if len(deletions) != 0: for i in deletions: print('unexpected deletion: %s' % i.xpath('string()')) self.fail('readability result does not match expected')

Exemple #2

0

Afficher le fichier

Fichier : test_article_only.py Projet : mitechie/python-readability

def test_si_sample(self): """Using the si sample, load article with only opening body element""" sample = load_sample('si-game.sample.html') doc = Document( sample, url='http://sportsillustrated.cnn.com/baseball/mlb/gameflash/2012/04/16/40630_preview.html') res = doc.summary() self.assertEqual('<html><body><div><div class', res[0:27])

Exemple #3

0

Afficher le fichier

Fichier : test_article_only.py Projet : mitechie/python-readability

def test_si_sample_full_summary(self): """We should parse the doc and get a full summary with confidence""" sample = load_sample('si-game.sample.html') doc = Document(sample, url='http://sportsillustrated.cnn.com/baseball/mlb/gameflash/2012/04/16/40630_preview.html') res = doc.summary_with_metadata(enclose_with_html_tag=False) self.assertTrue(hasattr(res, 'html'), 'res should have an html attrib') self.assertTrue(hasattr(res, 'confidence'), 'res should have an html attrib') self.assertTrue(hasattr(res, 'title'), 'res should have an titile attrib') self.assertTrue(hasattr(res, 'short_title'), 'res should have an short_title attrib') self.assertEqual('<div><div class="', res.html[0:17]) self.assertTrue(res.confidence > 50, 'The confidence score should be larger than 50: ' + str(res.confidence))

Exemple #4

0

Afficher le fichier

Fichier : test_article_only.py Projet : mitechie/python-readability

def test_si_sample_html_partial(self): """Using the si sample, make sure we can get the article alone.""" sample = load_sample('si-game.sample.html') doc = Document(sample, url='http://sportsillustrated.cnn.com/baseball/mlb/gameflash/2012/04/16/40630_preview.html') res = doc.summary(enclose_with_html_tag=False) self.assertEqual('<div><div class="', res[0:17])

Exemple #5

0

Afficher le fichier

Fichier : test_sample_articles.py Projet : mitechie/python-readability

def process_article(article): sample = load_sample(article) doc = Document(sample) res = doc.summary() failed_msg = "Failed to process the article: " + article assert '<html><body><div><div class' == res[0:27], failed_msg