Projects / Facebook Data Analysis

For this project, I helped Stephen Wolfram analyze a dataset of Facebook profilesblog.stephenwolfram.com we collected from volunteers who engaged with a previous project I worked on. We looked at the influence of the friendship paradox, correlations between various demographic variables, performed topic modeling on people’s facebook posts, and did cluster analysis on their friend graphs.

The friendship paradox summarizes that, on average, your friends have more friends than you do, because popular people are overrepresented. This can be seen from this distribution, which shows that friends of our users had more friends than they did:

logo

Other pecularities of Facebook usage could be seen by looking at the joint distribution between a user’s age and their friend network’s megian age, which reveals how older users have large groups of younger friends, probably representing the abundance of their children and grandchildren within their friend network:

logo

Perhaps my favorite analysis involved topic modelling (work done with Etienne Bernard), where we performed a fine-grained analysis of the probability of wall posts falling into one of many separate topics based on age and gender:

logo

The blog-post garned some media attentionarchive.nytimes.com, and resulted in some discussions with Facebook’s internal data-science team. I also did some follow-on work that was featurd in Wired UKwww.wired.co.uk, which looked at migration rates that could be inferred from the difference between user’s declare hometown and their current location:

logo

To analyze this dataset, I built a suite of tools in a combination of Mathematica and Go. Fine-grained multi-dimensional histograms could be created, in which axes represented age, gender, marital status, country of birth, etc. This allowed arbitrary projections and visualizations of higher dimensional data to be easily and efficiently computed on an ad-hoc basis, which allowed us to ask many questions to glean some of the results you see above.

Check out the full blog-postblog.stephenwolfram.com for far more detail!