Python split_camel_case Exemples

Langage de programmation: Python

Espace de nommage/Pack: crawler.text_utils

Méthode/Fonction: split_camel_case

Exemples au hotexamples.com: 2

Python split_camel_case - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de crawler.text_utils.split_camel_case extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Associées

set_config

LSwidgetChooseField

view_to_stream

path_processed

Info

reduce

render

transform_points_in_sweep

period_end_date

get_token_auth_plugin

Related in langs

is_banned_domain (PHP)

City (PHP)

PodStorageDrsEntry (C#)

DBLanguage (C#)

sync_with_helper (C++)

returnMsg (C++)

BundleFlavor (Go)

VerifyChart (Go)

BrandRepository (Java)

ALog (Java)

Exemple #1

0

Afficher le fichier

Fichier : test_text_utils.py Projet : neocortex/cuisifier

def test_split_camel_case(): eq_(split_camel_case('BlaBla'), 'Bla Bla') eq_(split_camel_case('Bla'), 'Bla') eq_(split_camel_case('iBla'), 'iBla') eq_(split_camel_case('iBlaBla'), 'iBla Bla') eq_(split_camel_case('BlaBlaBlaaa'), 'Bla Bla Blaaa') eq_(split_camel_case('iBlaBla BlaaaBla'), 'iBla Bla Blaaa Bla')

Exemple #2

0

Afficher le fichier

Fichier : text_extraction.py Projet : neocortex/cuisifier

def extract_text(doc, title_weight=None, header_weights=None, use_pdf=True, use_stemmer=False, ukkonen_len=0): """ Extracts cleaned text from an HTML or PDF. """ if is_pdf(doc): if use_pdf: try: text = extract_text_pdf(doc) except: # TODO: Nice error handling return '' else: return '' else: text = extract_text_html( str2unicode(doc), title_weight=title_weight, header_weights=header_weights) # Replace newlines etc. text = re.sub('\s+', ' ', text) # Replace Umlaute and apply 'unidecode' text = clean_text(text) # Remove punctuation replace_punctuation = string.maketrans( string.punctuation, ' ' * len(string.punctuation)) text = text.translate(replace_punctuation) # Remove multiple spaces text = re.sub(' +', ' ', text) # Strip text = text.strip() # Split camel case words text = ' '.join([split_camel_case(word) for word in text.split(' ')]) # Remove digits text = ''.join([c for c in text if not c.isdigit()]) # Remove single characters text = ' '.join([word for word in text.split(' ') if len(word) > 1]) # Remove multiple spaces text = re.sub(' +', ' ', text) # Stem if use_stemmer: text = apply_stemmer(text) # Remove long repeated strings if ukkonen_len: text = remove_repeated_long_strings(text, ukkonen_len) return text.lower()