Python orthographic_syllabify 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: indicnlp.syllable.syllabifier

메소드/함수: orthographic_syllabify

hotexamples.com에서의 예제들: 3

Python orthographic_syllabify - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 indicnlp.syllable.syllabifier.orthographic_syllabify에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

def run_syllabify(args):
    for line in args.infile:
        new_line = ' '.join([
            ' '.join(syllabifier.orthographic_syllabify(w, args.lang))
            for w in line.strip().split(' ')
        ])
        args.outfile.write(new_line + '\n')

예제 #2

파일 보기

파일: feature_generation.py 프로젝트: suman101112/online-hate-speech-recog

    def other_features(self,tweet):
        """
        expects text, returns a feature vector, for english and hindi
        """

        if self.lang == 'en':
            sentiment = self.sentiment_analyzer.polarity_scores(tweet)
            words = self.preprocess(tweet) #Get text only
            # pdb.set_trace()
            syllables = textstat.syllable_count(words)
            num_chars = sum(len(w) for w in words)
            num_chars_total = len(tweet)
            num_terms = len(tweet.split())
            num_words = len(words.split())
            avg_syl = round(float((syllables+0.001))/float(num_words+0.001),4)
            num_unique_terms = len(set(words.split()))

            ###Modified FK grade, where avg words per sentence is just num words/1
            FKRA = round(float(0.39 * float(num_words)/1.0) + float(11.8 * avg_syl) - 15.59,1)
            ##Modified FRE score, where sentence fixed to 1
            FRE = round(206.835 - 1.015*(float(num_words)/1.0) - (84.6*float(avg_syl)),2)

            twitter_objs = self.count_twitter_objs(tweet)
            retweet = 0
            if "rt" in words:
                retweet = 1
            features = [FKRA, FRE,syllables, avg_syl, num_chars, num_chars_total, num_terms, num_words,
                        num_unique_terms, sentiment['neg'], sentiment['pos'], sentiment['neu'], sentiment['compound'],
                        twitter_objs[2], twitter_objs[1],
                        twitter_objs[0], retweet]
            #features = pandas.DataFrame(features)
            return features
        if self.lang == 'hi':
            sentiment = self.sentiment_analyzer.predict(tweet)
            words = self.preprocess(tweet)
            
            syllables = len([syllabifier.orthographic_syllabify(w,self.lang) for w in hi_tokenizer(input=words , language_code=self.lang)])
            # pdb.set_trace()
            num_chars = sum(len(w) for w in words)
            num_chars_total = len(tweet)
            num_terms = len(tweet.split())
            num_words = len(words.split())
            avg_syl = round(float((syllables+0.001))/float(num_words+0.001),4)
            num_unique_terms = len(set(words.split()))

            ###Modified FK grade, where avg words per sentence is just num words/1
            FKRA = round(float(0.39 * float(num_words)/1.0) + float(11.8 * avg_syl) - 15.59,1)
            ##Modified FRE score, where sentence fixed to 1
            FRE = round(206.835 - 1.015*(float(num_words)/1.0) - (84.6*float(avg_syl)),2)

            twitter_objs = self.count_twitter_objs(tweet)
            retweet = 0
            if "rt" in words:
                retweet = 1
            features = [FKRA, FRE,syllables, avg_syl, num_chars, num_chars_total, num_terms, num_words,
                        num_unique_terms, sentiment[2][0].tolist(), sentiment[2][2].tolist(), sentiment[2][1].tolist(), sentiment[2][1].tolist()-sentiment[2][0].tolist()+sentiment[2][1].tolist(),
                        twitter_objs[2], twitter_objs[1],
                        twitter_objs[0], retweet]
            #features = pandas.DataFrame(features)
            return features

예제 #3

파일 보기

파일: syllabification.py 프로젝트: satti007/word2vec-for-Indian-Languages

def getSyllables(word, lang):
    return syllabifier.orthographic_syllabify(word, lang)