Python Corpus.emails_as_string 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: Corpus

클래스/타입: Corpus

메소드/함수: emails_as_string

hotexamples.com에서의 예제들: 2

Python Corpus.emails_as_string - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 Corpus.Corpus.emails_as_string에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Corpus(30)

find(5)

get_postag_set(4)

read(3)

__init__(2)

verificarPlagio(2)

add_source_document(2)

add_target_document(2)

get_file_name(2)

buildCorpus(2)

emails_as_string(2)

dump(2)

preprocess(2)

get_data(2)

read_ner(2)

outputWords(1)

pickledumpwords(1)

output_rules(1)

ner(1)

outputPOStags(1)

nettoyer_texte(1)

most_frequent_word_by_year(1)

most_frequent_word_by_month(1)

most_frequent_word_by_day(1)

most_frequent_word(1)

most_frequent_trigrams(1)

most_frequent_content_words(1)

picklegetwords(1)

read_label(1)

prepapre_to_matrix(1)

search_ambiguous(1)

vectoriserDocCorpus(1)

url_to_dir(1)

train_word2vec(1)

tag_words_with_most_likely_parses(1)

spanishTags(1)

set_lista_texto(1)

save_json(1)

process(1)

save(1)

results(1)

resetSentStats(1)

read_word2vec(1)

read_prediction(1)

load_json(1)

read_data(1)

most_frequent_bigrams(1)

get_instances(1)

lemmatiserCorpus(1)

calculSimilarite(1)

예제 #1

파일 보기

파일: filter.py 프로젝트: il-vladislav/SpamFilter

        def train(self,path_to_truth_dir):
                corpus = Corpus(path_to_truth_dir)
                #Read truth file
                truth = methods.read_classification_from_file(methods.add_slash(path_to_truth_dir)+"!truth.txt")
                #Make truth global
                self.truth = truth
                for fname, body in corpus.emails_as_string():
                        email_as_file = open(methods.add_slash(path_to_truth_dir) + fname,'r',encoding = 'utf-8')
                        #Read email with EMAIL parser
                        msg = email.message_from_file(email_as_file)
                        self.extract_senders_list(msg,fname)
                        self.check_subject(msg,fname)

                #Generate dict's
                methods.generate_file_from_dict(self.path_bl , self.black_list)
                methods.generate_file_from_dict(self.path_wl ,self.white_list)
                methods.generate_file_from_dict(self.path_ssl , self.spam_subject_list)
                methods.generate_file_from_dict(self.path_hsl ,self.ham_subject_list)

예제 #2

파일 보기

파일: filter.py 프로젝트: il-vladislav/SpamFilter

        def test(self, path_to_test_dir):
                predictions = {} #Predictions dict {fname:prediction}
                bs = Bayesian.Bayesian()
                corpus = Corpus(path_to_test_dir)
                #Read dict's (if test called before train)
                black_list_dict = methods.read_dict_from_file(self.path_bl)
                white_list_dict = methods.read_dict_from_file(self.path_wl)
                spam_subject_dict = methods.read_dict_from_file(self.path_ssl)
                ham_subject_dict = methods.read_dict_from_file(self.path_hsl)
                
                for fname, body in corpus.emails_as_string():
                        #Open email with parser
                        email_as_file = open(methods.add_slash(path_to_test_dir) + fname,'r',encoding = 'utf-8')
                        msg = email.message_from_file(email_as_file)

                        #Check if sender in a black list
                        if (self.extract_email_adress_from_text(msg['From']) in black_list_dict):
                                predictions[fname] = 'SPAM'
                        elif(self.extract_email_adress_from_text(msg['From']) in white_list_dict):
                        #Check if sender in a white list
                                predictions[fname] = 'OK'
                        #Check if subject in a black list
                        elif(self.extract_email_adress_from_text(msg['From']) in spam_subject_dict):
                             prediction[fname] = 'SPAM'
                        #Check if subject in a white list
                        elif(self.extract_email_adress_from_text(msg['From']) in ham_subject_dict):
                                prediction[fname] = 'OK'
                        #Run Bayesian checker
                        else:                
                                if (bs.bayesian_prediction(methods.get_text(msg))) > 0.485:
                                        predictions[fname] = 'SPAM'
                                else:
                                        predictions[fname] = 'OK'

                #Generate prediction file
                bf = BaseFilter(path_to_test_dir,predictions)
                bf.generate_prediction_file()