from sklearn.feature_extraction.text import CountVectorizer corpus = ['This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus) print(X.toarray()) print(vectorizer.get_feature_names())
from sklearn.feature_extraction.text import CountVectorizer corpus = ['I loved the movie!', 'The acting was terrible.', 'The plot was confusing.', 'Great movie, would recommend.'] vectorizer = CountVectorizer() X = vectorizer.fit_transform(corpus)In this example, the CountVectorizer is used to convert the text reviews into a frequency matrix, which can then be used as input for a sentiment analysis algorithm. The package library used for the above examples is scikit-learn (sklearn).