Python time_to_datetime 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: scrapers.utils

메소드/함수: time_to_datetime

hotexamples.com에서의 예제들: 6

Python time_to_datetime - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 scrapers.utils.time_to_datetime에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

    def parse_source(self, existing_ids=None):
        article_urls = []
        feed_content = get_rss(self.VAL202_RSS_URL)
        for feed_entry in feed_content.entries:
            link = feed_entry["link"]
            guid = feed_entry["guid"]
            if existing_ids and get_sha_hash(guid) in existing_ids:
                logger.debug("Skipping %s", guid)
                continue

            published_date = time_to_datetime(feed_entry["published_parsed"])
            try:
                text = feed_entry["content"][0]["value"]
                # Strip HTML
                soup = bs4.BeautifulSoup(text)
                text = soup.text
            except KeyError:
                return

            title = feed_entry["title"]
            author = feed_entry.get("author", None)

            article_urls.append((link, {
                "guid": guid,
                "published": published_date,
                "title": title,
                "text": text,
                "author": author
            }))

        return article_urls

예제 #2

파일 보기

파일: finance_parser.py 프로젝트: izacus/newsbuddy

    def parse_source(self, existing_ids=None):
        news = []
        feed_content = get_rss(self.FINANCE_RSS_URL)

        for feed_entry in feed_content.entries:
            link = feed_entry["link"]

            if existing_ids and get_sha_hash(link) in existing_ids:
                logger.debug("Skipping %s", link)
                continue

            published_date = time_to_datetime(feed_entry["published_parsed"])
            news.append((link, {"published": published_date}))

        return news

예제 #3

파일 보기

파일: delo_scraper.py 프로젝트: izacus/newsbuddy

    def parse_source(self, existing_ids=None):
        feed_content = get_rss(self.DELO_RSS_URL)
        article_urls = []

        for feed_entry in feed_content.entries:
            link = feed_entry["link"]

            if existing_ids and (get_hash(link) in existing_ids
                                 or get_sha_hash(link) in existing_ids):
                logger.debug("Skipping %s", link)
                continue

            published_date = time_to_datetime(feed_entry["published_parsed"])
            article_urls.append((link, {"published": published_date}))

        return article_urls

예제 #4

파일 보기

파일: rtv_scraper.py 프로젝트: izacus/newsbuddy

    def parse_source(self, existing_ids=None):
        news = []
        for rss_feed in self.RTV_RSS_URLS:
            logger.debug("Parsing %s", rss_feed)
            feed_content = get_rss(rss_feed)
            for feed_entry in feed_content.entries:
                # Download article
                link = feed_entry["link"]

                if existing_ids and (get_hash(link) in existing_ids
                                     or get_sha_hash(link) in existing_ids):
                    logger.debug("Skipping %s", link)
                    continue

                published_date = time_to_datetime(
                    feed_entry["published_parsed"])
                news.append((link, {"published": published_date}))

        return news

예제 #5

파일 보기

파일: monitor_scraper.py 프로젝트: izacus/newsbuddy

    def parse_source(self, existing_ids=None):
        article_urls = []
        feed_content = get_rss(self.MONITOR_RSS_URL)
        for feed_entry in feed_content.entries:
            link = feed_entry["link"]
            guid = feed_entry["guid"]

            if existing_ids and get_sha_hash(guid) in existing_ids:
                logger.debug("Skipping %s", guid)
                return

            published_date = time_to_datetime(feed_entry["published_parsed"])
            title = feed_entry["title"]

            article_urls.append((link, {
                "guid": guid,
                "title": title,
                "published": published_date
            }))

        return article_urls

예제 #6

파일 보기

    def parse_source(self, existing_ids=None):
        news = []
        feed_content = get_rss(self.DNEVNIK_RSS_URL)

        max_counter = 30
        for feed_entry in feed_content.entries:
            link = feed_entry["link"]

            if existing_ids and (get_hash(link) in existing_ids
                                 or get_sha_hash(link) in existing_ids):
                logger.debug("Skipping %s", link)
                continue

            published_date = time_to_datetime(feed_entry["published_parsed"])
            title = feed_entry["title"]
            news.append((link, {"published": published_date, "title": title}))

            max_counter -= 1
            if max_counter <= 0:
                break

        return news