FB_datalab

Python scripts + Facebook data

most_distinctive_links.py

From among the links published by Facebook users pick up the links which differentiate users the most (by age, gender, location etc.).

The input data is a sparse matrix. A number of first columns provide factors data (such as age, gender etc.), and the following columns provide features (URLs). Rows depict users.

Produce charts like the one below:

most_distinctive_likes.py

Similar to most_distinctive_links.py, but works on Facebook Pages liked by particular users.

run_nai.py

Cross-validate various methods of node attribute inference

Simplest use case:

from lib.nai.cv import NodeAttrInferCV
z = NodeAttrInferCV('data/fb_friends_graph.gml.gz')
z.run()
z.simple_stats()

simple_stats() method will return an output similar to this:

Shadowed per iteration: 251
Unshadowed: [251, 251, 248]
--------------------------------------------------------------------------------
iteration     1     2     3
diff                       
0          0.28  0.33  0.24
1          0.20  0.17  0.22
2          0.13  0.14  0.14
3          0.07  0.07  0.09
4          0.06  0.03  0.06

[5 rows x 3 columns]
--------------------------------------------------------------------------------
iteration     1     2     3
diff                       
0          0.28  0.33  0.24
1          0.48  0.50  0.46
2          0.61  0.64  0.60
3          0.68  0.71  0.69
4          0.74  0.74  0.75

[5 rows x 3 columns]
--------------------------------------------------------------------------------
Iteration: 1. Top 10 biggest errors: [41, 38, 29, 29, 27, 25, 25, 25, 24, 20].
Iteration: 2. Top 10 biggest errors: [46, 39, 36, 35, 32, 30, 29, 28, 28, 27].
Iteration: 3. Top 10 biggest errors: [35, 34, 32, 29, 25, 21, 19, 18, 17, 16].

query_sparse_db.py

This package comes with a number of databases in scipy's sparse matrix format. query_sparse_db.py provides some examples of helper functions to facilitate examining of these databases.

For example you are able to do this:

from lib.examine_sparse_db import ExamineSparseDB

a = ExamineSparseDB('data/links/links_matrix.mtx.gz',
                    'data/links/links_colnames.msg.gz',
                    'data/links/factors.msg.gz')

print a.get_popular(10)

This will return 10 most popular features:

[('youtube.com', 14980), ('Obrazki FB', 13843), ('facebook.com', 10432), ('demotywatory.pl', 7316), ('kwejk.pl', 6677), ('gazeta.pl', 5513), ('vimeo.com', 5321), ('onet.pl', 5031), ('wrzuta.pl', 4482), ('wp.pl', 3925)]

On the other hand calling:

print a.get_popular(10, least=True)

will return 10 least popular links:

[('548426_440524952689441_220491277_n.jpg', 16), ('523004_292705267495944_1723994932_n.jpg', 16), ('402728_357842164303281_1435223383_n.jpg', 15), ('christian-dogma.com', 15), ('youtube.com/watch?v=BVp8xWsteMo', 15), ('start.no', 15), ('lostbubblegame.com', 15), ('pato.pl', 15), ('takafaza.pl', 14), ('jakboczek.pl', 13)]

run_estimations.py

A sample of how to carry out cross-validation and prediction of particular factors given the available set of features.

To perform cross-valdation for predicting 'age' based on links Facebook users post on their profiles one should do the following:

from lib.estimate_factor import FactorEstimation

a = FactorEstimation('data/links/links_matrix.mtx.gz',
                     'data/links/links_colnames.msg.gz',
                     'data/links/factors.msg.gz')

a.cv('age', # factor to predict
     'split_val': 0, # values of factor to predict
     'shadow_func': lambda x: x > 66, # for users older than 66...
     'shadow_to_val': 0, # shadow their age with 0 and predict age
     'del_freq': 450) # remove features if shared by 450 users

This will produce results similar to the following lines:

4.14232033819 +/- 5.07028463257
4.3597933302 +/- 5.63557438023
3.99154532644 +/- 5.25908584676
4.04088345865 +/- 5.13718948072
4.0469924812 +/- 5.30592752445

These are mean errors and standard deviations for predictions in five iterations.

run_emotions_detection.py

Generate tagclouds of words with positive/negative meaning (words mostly in Polish), such as:

Also, generate a list of most distinctive features using NLTK's nltk.NaiveBayesClassifier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

lib

lib

results

results

utils

utils

.gitignore

.gitignore

README.md

README.md

most_distinctive_likes.py

most_distinctive_likes.py

most_distinctive_links.py

most_distinctive_links.py

query_csv_db.py

query_csv_db.py

query_sparse_db.py

query_sparse_db.py

run_emotions_detection.py

run_emotions_detection.py

run_estimations.py

run_estimations.py

run_nai.py

run_nai.py

Repository files navigation

FB_datalab

most_distinctive_links.py

most_distinctive_likes.py

run_nai.py

query_sparse_db.py

run_estimations.py

run_emotions_detection.py

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
lib		lib
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
most_distinctive_likes.py		most_distinctive_likes.py
most_distinctive_links.py		most_distinctive_links.py
query_csv_db.py		query_csv_db.py
query_sparse_db.py		query_sparse_db.py
run_emotions_detection.py		run_emotions_detection.py
run_estimations.py		run_estimations.py
run_nai.py		run_nai.py

vdeleon/FB_datalab

Folders and files

Latest commit

History

Repository files navigation

FB_datalab

most_distinctive_links.py

most_distinctive_likes.py

run_nai.py

query_sparse_db.py

run_estimations.py

run_emotions_detection.py

About

Resources

Stars

Watchers

Forks