Python BibIndexDefaultTokenizer 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: invenio.bibindex_tokenizers.BibIndexDefaultTokenizer

hotexamples.com에서의 예제들: 5

Python BibIndexDefaultTokenizer - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 invenio.bibindex_tokenizers.BibIndexDefaultTokenizer.BibIndexDefaultTokenizer에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

__init__(2)

예제 #1

파일 보기

파일: BibIndexFulltextTokenizer.py 프로젝트: jirikuncar/invenio-old

 def __init__(
     self, stemming_language=None, remove_stopwords=False, remove_html_markup=False, remove_latex_markup=False
 ):
     self.verbose = 3
     BibIndexDefaultTokenizer.__init__(
         self, stemming_language, remove_stopwords, remove_html_markup, remove_latex_markup
     )

예제 #2

파일 보기

파일: BibIndexExactAuthorTokenizer.py 프로젝트: chokribr/inveniotest

 def __init__(self,
              stemming_language=None,
              remove_stopwords=False,
              remove_html_markup=False,
              remove_latex_markup=False):
     BibIndexDefaultTokenizer.__init__(self, stemming_language,
                                       remove_stopwords, remove_html_markup,
                                       remove_latex_markup)

예제 #3

파일 보기

파일: BibIndexAuthorTokenizer.py 프로젝트: BessemAamira/invenio

 def __init__(self, stemming_language = None, remove_stopwords = False, remove_html_markup = False, remove_latex_markup = False):
     BibIndexDefaultTokenizer.__init__(self, stemming_language,
                                             remove_stopwords,
                                             remove_html_markup,
                                             remove_latex_markup)
     self.single_initial_re = re.compile('^\w\.$')
     self.split_on_re = re.compile('[\.\s-]')
     # lastname_stopwords describes terms which should not be used for indexing,
     # in multiple-word last names.  These are purely conjunctions, serving the
     # same function as the American hyphen, but using linguistic constructs.
     self.lastname_stopwords = set(['y', 'of', 'and', 'de'])

예제 #4

파일 보기

 def __init__(self,
              stemming_language=None,
              remove_stopwords=False,
              remove_html_markup=False,
              remove_latex_markup=False):
     BibIndexDefaultTokenizer.__init__(self, stemming_language,
                                       remove_stopwords, remove_html_markup,
                                       remove_latex_markup)
     self.single_initial_re = re.compile('^\w\.$')
     self.split_on_re = re.compile('[\.\s-]')
     # lastname_stopwords describes terms which should not be used for indexing,
     # in multiple-word last names.  These are purely conjunctions, serving the
     # same function as the American hyphen, but using linguistic constructs.
     self.lastname_stopwords = set(['y', 'of', 'and', 'de'])

예제 #5

파일 보기

파일: BibIndexCJKTokenizer.py 프로젝트: aw-bib/tind-invenio

 def __init__(self, stemming_language = None, remove_stopwords = False, remove_html_markup = False, remove_latex_markup = False):
     """Initialisation"""
     BibIndexDefaultTokenizer.__init__(self, stemming_language,
                                             remove_stopwords,
                                             remove_html_markup,
                                             remove_latex_markup)