A simple experiment to enrich twitter profiles with gender
-
Clone Repository
-
Install Requirements:
pip install -r requirements.txt
-
Run the file processor
python file_processor.py
—> It automatically downloads the dataset, builds the classifier, prints the test results and stores an enriched version of the dataset (csv file) under:
twitter_gender/data/users_enriched.csv
The last two columns are added:
- ‘prediction’: the prediction result ‘female’ or ‘male'
- ‘source’: If the final prediction has happened based on the name (’name) or on description + last tweet (’text’)
Yes.
from twitter_gender.file_processor import ProcessUsers
process_users = ProcessUsers()
labels = ['unknown', 'female', 'male']
gender = process_users.gender_classifier.get_gender_by_name('Johannes Erett')
print(labels[gender])
>> 'male'
gender = process_users.gender_classifier.get_gender_by_text_custom('I am a woman of great faith... in unicorns ❤️ ')
print(labels[gender])
>> 'female'
-
Currently the classifier is not stored, but alsways built and trained from scratch. I would recommend to pickle and re-load it.
-
Code needs better structure and more documentation.
Tested with Python 3.7.1