Python Extractor 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: content_extraction.extractor

클래스/타입: Extractor

hotexamples.com에서의 예제들: 3

Python Extractor - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 content_extraction.extractor.Extractor에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

__init__(3)

예제 #1

파일 보기

파일: cleanup.py 프로젝트: ChinHui-Chen/newssip

 def __init__(self, max_duplicates=2, **kwargs):
     """Initialize cleanup model learner.
     
     Takes standard options of Extractor, plus:
      - max_duplicates: maximum number of (near) identical documents in the set
     """
     
     Extractor.__init__(self, **kwargs)
     self.max_duplicates = max_duplicates
     
     ## dictionary of HTML elements (paths and content) with counts 
     self.elements = dict()

예제 #2

파일 보기

파일: cleanup.py 프로젝트: ChinHui-Chen/newssip

 def __init__(self, cleanup_model=None, cleanup_threshold=0.1, **kwargs):
     """Initialize cleanup model learner.
     
     Takes standard parameters of Extractor, plus:
      - cleanup_model: filename of the model to load, or model itself
      - cleanup_threshold: 0 means less conservative, 1 means more conservative
     """
    
     Extractor.__init__(self, **kwargs)
     self.cleanup_model = cleanup_model
     self.cleanup_threshold = cleanup_threshold
     
     assert self.cleanup_model, "PageCleaner extractor requires a cleanup model"
    
     self.load_model(self.cleanup_model)

예제 #3

파일 보기

파일: magazinesrussru.py 프로젝트: ChinHui-Chen/newssip

 def __init__(self, **kwargs):
     Extractor.__init__(self, **kwargs)
     self.pages = {}
     self.index = None