Python TranscriptScraper.get_first_day示例

编程语言: Python

命名空间/包名称: scraper.Transcript

方法/功能: get_first_day

hotexamples.com的示例: 1

Python TranscriptScraper.get_first_day - 已找到1个示例。这些是从开源项目中提取的最受好评的scraper.Transcript.TranscriptScraper.get_first_day现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

extract_monologues(2)

extract_messages_from_monologues(1)

get_first_day(1)

get_next_day(1)

get_pager(1)

示例#1

显示文件

文件： file_extractor.py 项目： cloud101/StackExchangeChatScraper

__author__ = 'lucas'
from scraper.Transcript import TranscriptScraper
import requests
from database.Elastic import ElasticManager
from tools.Logger import get_logger
from time import sleep
import os

logger = get_logger("scrape_dmz")
scraper = TranscriptScraper(151)
#keep a list which contains all URLs we need to fetch and process
process_list = set()
#keep a list of URLs which have already been processed so we do not fetch the same page twice
process_list.add(scraper.get_first_day())
processed_list = list()
#change headers for SE so they know if I cause load
headers = {
            'User-Agent': 'ChatExchangeScraper - contact Lucas Kauffman',
                }


x = 0

try:
		for root, dirs, files in os.walk("/home/lucas/dmz"):
			for file in files:
				if file.endswith(".html"):
					 with open(os.path.join(root, file)) as FILE:
						 response = FILE.read()
						 #a monologue can contain several messages
						 monologues = scraper.extract_monologues(response)