Python TextPreprocessor Beispiele

Programmiersprache: Python

Namespace / Paketname: predictocite.datasets.preprocessing

Klasse / Typ: TextPreprocessor

Beispiele auf hotexamples.com: 5

Python TextPreprocessor - 5 Beispiele gefunden. Dies sind die am besten bewerteten Python Beispiele für die predictocite.datasets.preprocessing.TextPreprocessor, die aus Open Source-Projekten extrahiert wurden. Sie können Beispiele bewerten, um die Qualität der Beispiele zu verbessern.

Häufig verwendete Methoden

Anzeigen Verbergen

split_data(4)

frequency_term_matrix(3)

bag_of_words(1)

Beispiel #1

0

Datei anzeigen

Datei: test_preprocessing_data.py Projekt: RobSullivan/predictocite

	def test_preprocessing_bag_of_words(self):
		"""
		bag_of_words will return a scipy.frequency_term.csr.csr_matrix so test for these attrs.
		
		"""
		
		preprocessor = TextPreprocessor(self.articles)
		x_train_counts = preprocessor.bag_of_words()
		self.assertTrue(hasattr(x_train_counts, 'shape'))

Beispiel #2

0

Datei anzeigen

Datei: test_preprocessing_data.py Projekt: RobSullivan/predictocite

	def test_create_frequency_term_matrix(self):
		"""
		Once have vocab indexed create frequency_term matrix 
		"""
		preprocessor = TextPreprocessor(self.articles)
		split_data = preprocessor.split_data()
		preprocessor.count_vect.fit_transform(split_data['train'])
		frequency_term_matrix = preprocessor.frequency_term_matrix(split_data['train']) #preprocessor.count_vect.transform(split_data['train'])
		
		self.assertTrue(hasattr(frequency_term_matrix, 'transpose'))

Beispiel #3

0

Datei anzeigen

Datei: test_preprocessing_data.py Projekt: RobSullivan/predictocite

	def test_tfidf_weighting(self):
		preprocessor = TextPreprocessor(self.articles)
		split_data = preprocessor.split_data()
		term_freq_matrix = preprocessor.frequency_term_matrix(split_data['train'])

		#calculate the idf for term frequency matrix with fit()
		preprocessor.tf_transformer.fit(term_freq_matrix)
		# once calculated transform the term_freq_matrix
		# to the tf-idf weight matrix
		tf_idf_matrix = preprocessor.tf_transformer.transform(term_freq_matrix)
		
		self.assertTrue(hasattr(tf_idf_matrix.todense(), 'shape'))

Beispiel #4

0

Datei anzeigen

Datei: test_preprocessing_data.py Projekt: RobSullivan/predictocite

	def test_term_frequency_features(self):
		"""
		tf-idf helper test
		The last step before classification
		"""
		#tfidf_transformer = TfidfTransformer()
		preprocessor = TextPreprocessor(self.articles)
		split_data = preprocessor.split_data()
		
		term_freq_matrix = preprocessor.frequency_term_matrix(split_data['train'])
		
		tfidf = preprocessor.tf_transformer.fit(term_freq_matrix)
		self.assertEqual(tfidf.norm, 'l2')

Beispiel #5

0

Datei anzeigen

Datei: test_preprocessing_data.py Projekt: RobSullivan/predictocite

	def setUp(self):
		self.groups = ['one_to_five_citations']
		self.articles = fetch_citationgroups(self.groups)
		preprocessor = TextPreprocessor(self.articles)
		split_data = preprocessor.split_data()