Python PunktSentenceTokenizer.span_tokenize Beispiele

Programmiersprache: Python

Namespace / Paketname: nltk

Methode / Funktion: span_tokenize

Beispiele auf hotexamples.com: 2

Python PunktSentenceTokenizer.span_tokenize - 2 Beispiele gefunden. Dies sind die am besten bewerteten Python Beispiele für die nltk.PunktSentenceTokenizer.span_tokenize, die aus Open Source-Projekten extrahiert wurden. Sie können Beispiele bewerten, um die Qualität der Beispiele zu verbessern.

Häufig verwendete Methoden

Anzeigen Verbergen

PunktSentenceTokenizer(23)

tokenize(16)

sentences_from_text(3)

span_tokenize(2)

train(1)

Beispiel #1

Datei anzeigen

Datei: nltk_processors.py Projekt: awoziji/forte

class NLTKSentenceSegmenter(PackProcessor):
    r"""A wrapper of NLTK sentence tokenizer.
    """
    def __init__(self):
        super().__init__()
        self.sent_splitter = PunktSentenceTokenizer()

    def _process(self, input_pack: DataPack):
        for begin, end in self.sent_splitter.span_tokenize(input_pack.text):
            Sentence(input_pack, begin, end)

Beispiel #2

Datei anzeigen

class NLTKSentenceSegmenter(PackProcessor):
    r"""A wrapper of NLTK sentence tokenizer."""
    def initialize(self, resources: Resources, configs: Config):
        super().initialize(resources, configs)
        nltk.download("punkt")

    def __init__(self):
        super().__init__()
        self.sent_splitter = PunktSentenceTokenizer()

    def _process(self, input_pack: DataPack):
        for begin, end in self.sent_splitter.span_tokenize(input_pack.text):
            Sentence(input_pack, begin, end)

    def record(self, record_meta: Dict[str, Set[str]]):
        r"""Method to add output type record of `NLTKSentenceSegmenter`, which
        is `ft.onto.base_ontology.Sentence`
        to :attr:`forte.data.data_pack.Meta.record`.

        Args:
            record_meta: the field in the datapack for type record that need to
                fill in for consistency checking.
        """
        record_meta["ft.onto.base_ontology.Sentence"] = set()