Python _is_punctuation 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: tokenization

메소드/함수: _is_punctuation

hotexamples.com에서의 예제들: 6

Python _is_punctuation - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 tokenization._is_punctuation에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: run_cmrc2018_drcd_baseline.py 프로젝트: ZeweiChu/cmrc2018

def customize_tokenizer(text, do_lower_case=False):
  tokenizer = tokenization.BasicTokenizer(do_lower_case=do_lower_case)
  temp_x = ""
  text = tokenization.convert_to_unicode(text)
  for c in text:
    if tokenizer._is_chinese_char(ord(c)) or tokenization._is_punctuation(c) or tokenization._is_whitespace(c) or tokenization._is_control(c):
      temp_x += " " + c + " "
    else:
      temp_x += c
  if do_lower_case:
    temp_x = temp_x.lower()
  return temp_x.split() # 所以我们这里会拿到一个list

예제 #2

파일 보기

 def _joinTokens_orig(self, example):
     tokens = []
     for t0i, token0 in enumerate(example.tokens0):
         if token0.startswith("##"):
             while len(tokens) > 0 and tokens[-1] == " " and not (
                     len(tokens) > 1
                     and tokenizationOrig._is_punctuation(tokens[-2][-1])):
                 tokens.pop()
             token0 = token0[2:]
         tokens.append(token0)
     text = "".join(tokens)
     return text

예제 #3

파일 보기

파일: tokenization_test.py 프로젝트: noDefinition/works

 def test_is_punctuation(self):
     self.assertTrue(tokenization._is_punctuation(u"-"))
     self.assertTrue(tokenization._is_punctuation(u"$"))
     self.assertTrue(tokenization._is_punctuation(u"`"))
     self.assertTrue(tokenization._is_punctuation(u"."))
     self.assertFalse(tokenization._is_punctuation(u"A"))
     self.assertFalse(tokenization._is_punctuation(u" "))

예제 #4

파일 보기

파일: processing.py 프로젝트: zmskye/TextBrewer

def customize_tokenizer(text, do_lower_case=True):
    temp_x = ""
    text = tokenization.convert_to_unicode(text)
    for c in text:
        if _is_chinese_char(ord(c)) or tokenization._is_punctuation(
                c) or tokenization._is_whitespace(
                    c) or tokenization._is_control(c):
            temp_x += " " + c + " "
        else:
            temp_x += c
    if do_lower_case:
        temp_x = temp_x.lower()
    return temp_x.split()

예제 #5

파일 보기

파일: tokenization_test.py 프로젝트: Wanke15/bert

  def test_is_punctuation(self):
    self.assertTrue(tokenization._is_punctuation(u"-"))
    self.assertTrue(tokenization._is_punctuation(u"$"))
    self.assertTrue(tokenization._is_punctuation(u"`"))
    self.assertTrue(tokenization._is_punctuation(u"."))

    self.assertFalse(tokenization._is_punctuation(u"A"))
    self.assertFalse(tokenization._is_punctuation(u" "))

예제 #6

파일 보기

def _is_chinese_or_punctuation(ch):
    return _is_chinese_char(ch) or _is_punctuation(ch)