Python German.add_pipe 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: spacy.lang.de

클래스/타입: German

메소드/함수: add_pipe

hotexamples.com에서의 예제들: 5

Python German.add_pipe - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 spacy.lang.de.German.add_pipe에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

German(30)

pipe(6)

add_pipe(5)

create_pipe(5)

component(1)

from_config(1)

has_factory(1)

예제 #1

파일 보기

파일: _ProcessingFunctions.py 프로젝트: fabs200/TIS-Innovation-LDA-Sentiment

    removes numbers in complex format, but not if a . is followed as it introduces the end of a sentence.
    run after DateRemover()
    Examples: 15.10 Uhr OR 3,5 bis 4 stunden. OR 100 000 euro. OR 20?000 förderanträge OR um 2025/2030 OR
    OR abc 18.000. a OR abc. 18.000. a OR abc 18. a  OR abc 7.8.14. a  OR abc 7. 14. 18. a OR abc 1970er. a
    OR abc 20?()/&!%000. a  OR abc 2,9-3,5. a OR abc . 18. a OR abc . 7.8.14. a OR abc . 7. 14. 18. a OR abc 1790er
    OR abc . 20?()/&!%000 a  OR abc . 2,9-3,5 a OR abc 45, 59 a OR abc . 14 z OR abc  1. e OR abc  v. 2 a
    """
    string = re.sub('(?<!\w)(\d+)([\W\s]+|)|([\W\s]+)\d+', ' ',
                    string)  # TODO: check later
    # Alternative: ((\d+)(.|\s{1,3}|)\d+)(.|\s)(?! er)
    return string


nlp = German()
sbd = nlp.create_pipe('sentencizer')
nlp.add_pipe(sbd)


def Sentencizer(string, verbose=False):
    """
    requires from importing language from spacy and loading of sentence boundary detection:
    from spacy.lang.de import German
    nlp = German()
    sbd = nlp.create_pipe('sentencizer')
    nlp.add_pipe(sbd)

    for some single strings nlp() cannot process (rare, e.g. 'nan'), exclude those; except pass solve later
    """
    sents_list = []
    try:
        doc = nlp(string)

예제 #2

파일 보기

파일: knowledge_graph_from_unstructured_text.py 프로젝트: r1marcus/knowledge_graph_from_unstructured_text_german

def getSentences(text):
    nlp = German()
    nlp.add_pipe(nlp.create_pipe('sentencizer'))
    document = nlp(text)
    return [sent.string.strip() for sent in document.sents]

예제 #3

파일 보기

파일: run_sentencizing.py 프로젝트: daveh19/bert-for-radiology

import spacy
from spacy.lang.de import German
import pandas as pd
import time

nlp = German()
nlp.add_pipe(nlp.create_pipe('sentencizer')) 

texts = pd.read_csv('../data/cleaned-text-dump.csv', low_memory=False) 

def sentencizer(raw_text, nlp):
    doc = nlp(raw_text)
    sentences = [sent.string.strip() for sent in doc.sents]
    return(sentences)

def fix_wrong_splits(sentences): 
    i=0
    
    while i < (len(sentences)-2): 
        if sentences[i].endswith(('Z.n.','V.a.','v.a.', 'Vd.a.' 'i.v', ' re.', 
                                  ' li.', 'und 4.', 'bds.', 'Bds.', 'Pat.', 
                                  'i.p.', 'i.P.', 'b.w.', 'i.e.L.', ' pect.', 
                                  'Ggfs.', 'ggf.', 'Ggf.',  'z.B.', 'a.e.'
                                  'I.', 'II.', 'III.', 'IV.', 'V.', 'VI.', 'VII.', 
                                  'VIII.', 'IX.', 'X.', 'XI.', 'XII.')):
            sentences[i:i+2] = [' '.join(sentences[i:i+2])]

        elif len(sentences[i]) < 10: 
            sentences[i:i+2] = [' '.join(sentences[i:i+2])]

        i+=1

예제 #4

파일 보기

from spacy.lang.de import German

nlp = German()  # just the language with no model
sentencizer = nlp.create_pipe("sentencizer")
nlp.add_pipe(sentencizer)
doc = nlp(u"""
14. Davon ich allzeit froehlich sei,
Zu springen, singen immer frei
Das rechte Susannine* schon,
Mit Herzen Lust den suessen Ton.

15. Lob, Ehr sei Gott im hoechsten Thron,
Der uns schenkt seinen ein'gen Sohn,
Des freuen sich der Engel Schaar
Und singen uns solch's neues Jahr.
""")
for sent in doc.sents:
    print(sent.text)

예제 #5

파일 보기

파일: solution_03_12.py 프로젝트: datalayer-externals/edu-spacy

    CAPITALS = json.loads(f.read())

nlp = German()
matcher = PhraseMatcher(nlp.vocab)
matcher.add("COUNTRY", None, *list(nlp.pipe(COUNTRIES)))


def countries_component(doc):
    # Erstelle eine Entitäts-Span mit dem Label "LOC" für alle Resultate
    matches = matcher(doc)
    doc.ents = [
        Span(doc, start, end, label="LOC") for match_id, start, end in matches
    ]
    return doc


# Füge die Komponente zur Pipeline hinzu
nlp.add_pipe(countries_component)
print(nlp.pipe_names)

# Getter-Funktion, die den Text der Span im Lexikon der Hauptstädte nachschlägt
get_capital = lambda span: CAPITALS.get(span.text)

# Registriere die Span-Erweiterung "capital" mit Getter-Funktion get_capital
Span.set_extension("capital", getter=get_capital)

# Verarbeite den Text und drucke den Text, das Label und das Attribut capital für jede Entität
doc = nlp(
    "Tschechien könnte der Slowakei dabei helfen, ihren Luftraum zu schützen")
print([(ent.text, ent.label_, ent._.capital) for ent in doc.ents])