Python get_closest_word示例

编程语言: Python

命名空间/包名称: scrapy_test.libs.text_utils.search.fuzzy_search

方法/功能: get_closest_word

hotexamples.com的示例: 2

Python get_closest_word - 已找到2个示例。这些是从开源项目中提取的最受好评的scrapy_test.libs.text_utils.search.fuzzy_search.get_closest_word现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： individual_listing_spider.py 项目： huokedu/dynamic-scrapy

  def __init__(self, *args, **kwargs):
    #hack process based utils expects id to be a primary key, but we're passing in a url
    #if we just passed in url, processbased utils would not forward it onto the crawler process

    url = kwargs['id']

    #get the root domain - but this is pretty naive. consider
    #https://github.com/john-kurkowski/tldextract
    netloc_split = urlparse(url).netloc.split(".")
    if 'www' in netloc_split:
      domain = netloc_split[1]
    else:
      domain = netloc_split[0]

    listing_sources = ListingSource.objects.filter(url__icontains=domain).all()
    source_dict = {k.url: k for k in listing_sources}

    closest_url = fuzzy_search.get_closest_word(url, source_dict.keys())

    config = source_dict[closest_url].scraper_config

    self.scraper = config.scraper
    self.scrape_url = url
    self.ref_object = config

    super(IndividualListingSpider, self).__init__(*args, **kwargs)

示例#2

显示文件

文件： test_fuzzy_search.py 项目： huokedu/dynamic-scrapy

def test_fuzzy_parser_gets_closest_source(target, sources, expected):
    actual = fuzzy_search.get_closest_word(target, sources)
    assert expected == actual