Python Configuration.memoize_articles 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: newspaper.configuration

클래스/타입: Configuration

메소드/함수: memoize_articles

hotexamples.com에서의 예제들: 4

Python Configuration.memoize_articles - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 newspaper.configuration.Configuration.memoize_articles에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Configuration(26)

language(6)

fetch_images(5)

follow_meta_refresh(4)

get_parser(4)

browser_user_agent(3)

memoize_articles(3)

keep_article_html(2)

MAX_AUTHORS(1)

MAX_TITLE(1)

MIN_WORD_COUNT(1)

is_memoize_articles(1)

verbose(1)

예제 #1

파일 보기

    def test_download_works(self):
        config = Configuration()
        config.memoize_articles = False
        slate_paper = newspaper.build('http://slate.com', config=config)
        tc_paper = newspaper.build('http://techcrunch.com', config=config)
        espn_paper = newspaper.build('http://espn.com', config=config)

        print('Slate has %d articles TC has %d articles ESPN has %d articles' %
              (slate_paper.size(), tc_paper.size(), espn_paper.size()))

        papers = [slate_paper, tc_paper, espn_paper]
        news_pool.set(papers, threads_per_source=2)

        news_pool.join()

        print 'Downloaded Slate mthread len', len(slate_paper.articles[0].html)
        print 'Downloaded ESPN mthread len', len(espn_paper.articles[-1].html)
        print 'Downloaded TC mthread len', len(tc_paper.articles[1].html)

예제 #2

파일 보기

파일: unit_tests.py 프로젝트: Geekking/newspaper

    def test_download_works(self):
        config = Configuration()
        config.memoize_articles = False
        slate_paper = newspaper.build('http://slate.com', config=config)
        tc_paper = newspaper.build('http://techcrunch.com', config=config)
        espn_paper = newspaper.build('http://espn.com', config=config)

        print ('slate has %d articles tc has %d articles espn has %d articles'
               % (slate_paper.size(), tc_paper.size(), espn_paper.size()))

        papers = [slate_paper, tc_paper, espn_paper]
        news_pool.set(papers, threads_per_source=2)

        news_pool.join()

        print 'Downloaded slate mthread len', len(slate_paper.articles[0].html)
        print 'Downloaded espn mthread len', len(espn_paper.articles[-1].html)
        print 'Downloaded tc mthread len', len(tc_paper.articles[1].html)

예제 #3

파일 보기

파일: runner.py 프로젝트: vishalsingh8989/newslyServer

__author__ = "Vishal Jasrotia"
__copyright__ = ""
__credits__ = ["Vishal Jasrotia"]
__license__ = ""
__version__ = "1.0.0"
__maintainer__ = "Vishal Jasrotia"
__email__ = "*****@*****.**"
__status__ = "Testing"

from newsly.Builder import NewsBuilder
from newspaper.configuration import Configuration

if __name__ == "__main__":
    #pass config = Configuration()
    config = Configuration()
    config.memoize_articles = True  # True in production
    config.MAX_AUTHORS = 2
    config.MIN_WORD_COUNT = 300
    #config.MAX_SUMMARY  = 900 on text .Not on summary . dont use it
    #TODO : Have a separate ArticleConfig and SourceConfig extend this!

    builder = NewsBuilder(config)
    builder.build()
    builder.print_source_vs_article_url()

예제 #4

파일 보기

파일: base_crawler.py 프로젝트: Jonoans/ScoutSpyder

    '.css',
    '.png',
    '.jpg',
    '.jpeg',
    '.pdf',
    '.ico',
    '.gif',
    '.m4a',
    '.woff2'
]
BLACKLIST_REGEX = [
    'http[s]?://(.*)signout(.*)'
]
NEWSPAPER_CONFIG = Configuration()
NEWSPAPER_CONFIG.fetch_images = False
NEWSPAPER_CONFIG.memoize_articles = False

class BaseCrawler:
    # Crawler Identifier
    crawler_id = 'com.base'

    # Rate limit configuration
    requests_per_sec = 1

    # robots.txt url
    robots_url = None

    # URLs of pages to crawl
    # start from
    start_url = []