Python STOP_WORDS.append 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: spacy.lang.en.stop_words

클래스/타입: STOP_WORDS

메소드/함수: append

hotexamples.com에서의 예제들: 2

Python STOP_WORDS.append - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 spacy.lang.en.stop_words.STOP_WORDS.append에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

add(27)

union(23)

update(5)

remove(3)

append(2)

copy(2)

예제 #1

파일 보기

sentence = nlp("We will go to movie after the dinner")
print(sentence)

notStopWords = [
    notStopWords.text for notStopWords in sentence if not notStopWords.is_stop
]
print(notStopWords)

stopWords = [stopWords.text for stopWords in sentence if stopWords.is_stop]
print(stopWords)

#Add & Remove a new Stop Word
import nltk
STOP_WORDS = nltk.corpus.stopwords.words('english')
STOP_WORDS.append('Test')

print(len(STOP_WORDS))
print(STOP_WORDS)

import nltk

STOP_WORDS.remove('Test')

print(len(STOP_WORDS))
print(STOP_WORDS)

import spacy
from spacy.lang.en.stop_words import STOP_WORDS

STOP_WORDS.add("Test")

예제 #2

파일 보기

파일: vectorize_text.py 프로젝트: jasminetanom/hackernews_plus_plus

from bs4 import BeautifulSoup
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
import nltk
from nltk.stem import WordNetLemmatizer
from textacy.preprocess import preprocess_text, replace_numbers, replace_phone_numbers, replace_urls
from gensim.utils import to_utf8, tokenize
from gensim.models.phrases import Phrases, Phraser

STOP_WORDS = list(STOP_WORDS)
STOP_WORDS.append('http')
STOP_WORDS.append('www')

def strip_html(text):
    """Remove HTML characters, if any"""
    soup = BeautifulSoup(text, "html.parser")
    return soup.get_text()

def clean_text(text):
    text = text.replace('/n', ' ')).replace('.com', ' ').replace('.org', ' ').replace('.net', ' ')
    text = strip_html(text)
    # Remove contractions, if any:
    text = preprocess_text(text, fix_unicode=True, no_accents=True, no_contractions=True, lowercase=True, no_punct=True, no_currency_symbols=True), replace_with=' ')
    text = replace_urls(text, replace_with='')
    text = replace_numbers(text, replace_with='')
    return text

def tokenize_text(text):
    text = clean_text(text)
    return list(tokenize(text))