Python CrawlerFactory.get_crawler примеры использования

Язык программирования: Python

Пространство имен/Пакет: crawlers.CrawlerFactory

Класс/Тип: CrawlerFactory

Метод/Функция: get_crawler

Примеров на hotexamples.com: 6

Python CrawlerFactory.get_crawler - 6 примеров найдено. Это лучшие примеры Python кода для crawlers.CrawlerFactory.CrawlerFactory.get_crawler, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

CrawlerFactory(3)

get_crawler(3)

create_crawler(1)

Пример #1

Показать файл

Файл: test_tweets.py Проект: nihaofuyue0617/pythia

'''
Created on 22 Jan 2012

@author: george
'''
import datetime
from crawlers.CrawlerFactory import CrawlerFactory
from database.model.tweets import TwoGroupsTweet
from mongoengine import *

f = CrawlerFactory()
t = f.get_crawler("topsy")

search_hashtags = "uk OR #uk OR #UK or #usa OR #USA OR #US OR usa OR us"
t.search_for(search_hashtags)
t.search_between(from_date=datetime.datetime(2011, 01, 23, 0, 0, 0), 
                 to_date=datetime.datetime(2011, 01, 25, 0, 0, 0), 
                 granularity_days=1, 
                 granularity_hours=0, 
                 granularity_mins=0)
t.retrieve_items_of_type(TwoGroupsTweet)
t.crawl()

Пример #2

Показать файл

Файл: crawlers_tests.py Проект: nihaofuyue0617/pythia

 def test_construction_of_twitter_crawlers(self):
     factory = CrawlerFactory()
     t = factory.get_crawler("twitter")
     t.login()
     info = t.getUserInfoByScreenName("GeorgeEracleous")

Пример #3

Показать файл

Файл: training_authors.py Проект: nihaofuyue0617/pythia

'''
Created on 22 Jan 2012

@author: george
'''
import datetime
from crawlers.CrawlerFactory import CrawlerFactory
from database.model.tweets import *
from database.model.agents import *
from mongoengine import *
import tools.utils
from urlparse import urlparse
from database.warehouse import WarehouseServer

f = CrawlerFactory()
twitter = f.get_crawler("twitter")
#twitter.login()
ws = WarehouseServer()

from_date = datetime.datetime(2011, 1, 25, 0, 0, 0)
to_date = datetime.datetime(2011, 1, 26, 0, 00, 0)
items = ws.get_documents_by_date(from_date, to_date, limit=100)
screen_names = []
for tweet in items:
    screen_names.append(tweet.author_screen_name)
screen_names = set(screen_names)
print len(screen_names)
# A terrible hack to save the screen_names of users which are mentioned in tweets
# but they are not yet in the database. They'll be considered after all authors have
#been stored.
mentions_of_not_stored_users = []

Пример #4

Показать файл

Файл: training_authors.py Проект: aurora1625/pythia

'''
Created on 22 Jan 2012

@author: george
'''
import datetime
from crawlers.CrawlerFactory import CrawlerFactory
from database.model.tweets import *
from database.model.agents import *
from mongoengine import *
import tools.utils
from urlparse import urlparse
from database.warehouse import WarehouseServer

f = CrawlerFactory()
twitter = f.get_crawler("twitter")
#twitter.login()
ws = WarehouseServer()

from_date = datetime.datetime(2011, 1, 25, 0, 0, 0)
to_date = datetime.datetime(2011, 1, 26, 0, 00, 0) 
items = ws.get_documents_by_date(from_date, to_date, limit=100)  
screen_names = []
for tweet in items:
    screen_names.append(tweet.author_screen_name)
screen_names = set(screen_names)
print len(screen_names)
# A terrible hack to save the screen_names of users which are mentioned in tweets 
# but they are not yet in the database. They'll be considered after all authors have 
#been stored.
mentions_of_not_stored_users = []

Пример #5

Показать файл

Файл: annotate_authors.py Проект: giorgosera/pythia-hackathon

'''
Created on 22 Jan 2012

@author: george
'''
from database.model.agents import TrainingAuthor
from crawlers.CrawlerFactory import CrawlerFactory

f = CrawlerFactory()
crawler = f.get_crawler("scrapy")

crawler.setup(user_type=TrainingAuthor)
crawler.crawl(store=True)

Пример #6

Показать файл

Файл: crawlers_tests.py Проект: aurora1625/pythia

 def test_construction_of_twitter_crawlers(self):
     factory = CrawlerFactory()
     t = factory.get_crawler("twitter")
     t.login()
     info = t.getUserInfoByScreenName("GeorgeEracleous")