Exemplos de IntrinsicUtility.add_ground_truth_to_passages em Python

Linguagem de programação: Python

Espaço para nome / nome do pacote: plagcomps.shared.util

Classe / Tipo: IntrinsicUtility

Método / Função: add_ground_truth_to_passages

Exemplos em hotexamples.com: 2

IntrinsicUtility.add_ground_truth_to_passages em Python - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de plagcomps.shared.util.IntrinsicUtility.add_ground_truth_to_passages em Python extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

IntrinsicUtility(8)

get_n_training_files(6)

get_plagiarized_spans(2)

add_ground_truth_to_passages(1)

get_relative_training_set(1)

Métodos Frequentes

IntrinsicUtility (8)

get_n_training_files (6)

get_plagiarized_spans (2)

add_ground_truth_to_passages (1)

get_relative_training_set (1)

Exemplo n.º 1

0

Exibir arquivo

Arquivo: serialization.py Projeto: NoahCarnahan/plagcomps

def extract_and_serialize(txt_file, xml_file, out_file, atom_type='paragraph', cluster_method='kmeans', k=2): ''' Performs all of intrinsic (feature extraction, clustering etc.) and creates Passage objects for each passage in <txt_file>. Writes a CSV file out to <out_file> containing all the features of <txt_file> The CSV files can be read easily by R in order to create plots ''' f = file(txt_file, 'r') text = f.read() f.close() util = IntrinsicUtility() feature_names = [ 'average_word_length', 'average_sentence_length', 'stopword_percentage', 'punctuation_percentage', 'syntactic_complexity', 'avg_internal_word_freq_class', 'avg_external_word_freq_class' ] ext = FeatureExtractor(text) print 'Initialized extractor' # Note that passages don't know their ground truths yet passages = ext.get_passages(feature_names, atom_type) print 'Extracted passages' util.add_ground_truth_to_passages(passages, xml_file) feature_vecs = [p.features.values() for p in passages] # If just testing feature extraction, don't cluster passages if cluster_method != 'none': # Cluster the passages and set their confidences confidences = cluster(cluster_method, k, feature_vecs) for psg, conf in zip(passages, confidences): psg.set_plag_confidence(conf) f = file(out_file, 'wb') csv_writer = csv.writer(f) # Writes out the header for corresponding CSV csv_writer.writerow(IntrinsicPassage.serialization_header(feature_names)) for p in passages: csv_writer.writerow(p.to_list(feature_names)) f.close() print 'Finished writing', out_file

Exemplo n.º 2

0

Exibir arquivo

def extract_and_serialize(txt_file, xml_file, out_file, atom_type='paragraph', cluster_method='kmeans', k=2): ''' Performs all of intrinsic (feature extraction, clustering etc.) and creates Passage objects for each passage in <txt_file>. Writes a CSV file out to <out_file> containing all the features of <txt_file> The CSV files can be read easily by R in order to create plots ''' f = file(txt_file, 'r') text = f.read() f.close() util = IntrinsicUtility() feature_names = [ 'average_word_length', 'average_sentence_length', 'stopword_percentage', 'punctuation_percentage', 'syntactic_complexity', 'avg_internal_word_freq_class', 'avg_external_word_freq_class' ] ext = FeatureExtractor(text) print 'Initialized extractor' # Note that passages don't know their ground truths yet passages = ext.get_passages(feature_names, atom_type) print 'Extracted passages' util.add_ground_truth_to_passages(passages, xml_file) feature_vecs = [p.features.values() for p in passages] # If just testing feature extraction, don't cluster passages if cluster_method != 'none': # Cluster the passages and set their confidences confidences = cluster(cluster_method, k, feature_vecs) for psg, conf in zip(passages, confidences): psg.set_plag_confidence(conf) f = file(out_file, 'wb') csv_writer = csv.writer(f) # Writes out the header for corresponding CSV csv_writer.writerow(IntrinsicPassage.serialization_header(feature_names)) for p in passages: csv_writer.writerow(p.to_list(feature_names)) f.close() print 'Finished writing', out_file