2010年12月27日 星期一

分享:Facebook Research




What's on your mind?

Facebook Data Team 寫於 2010年12月24日 5:41
People use status updates to share what's on their minds, to tell others what they're doing, and to gather feedback from friends. The different ways people use status updates form some interesting patterns. In this study, we looked at the usage of words in different "word categories" in status updates. This led us to discover some patterns in how people use status updates differently, and how their friends interact with different status updates.

The Word Categories Explained

The word categories we used are from the LIWC dictionary. It provides 68 word categories of different types corresponding to meaningful psychological and linguistic constructs, along with a list of words belonging to each category. Words are categorized based on their part of speech (pronouns, articles, past-tense verbs, etc.), their emotional content (positive emotions, negative emotions, sadness, anger, etc.), or the topic they are related to (school, work, religion, etc). See [1] for a full list of word categories.

After removing identifiable information from the updates, we had our computers calculate the percentage of words in status updates belonging to each word category (so no human ever read your updates). In sum, about one million updates were analyzed, all from US English speakers. As an example of how the words are counted, 22% of the words in the update below fall into the "prepositions" category ("to", "for", "since", and "in"), 11% into the "past tense verbs" category ("has" and "missed"), and 11% into the "inclusive" category ("in" and "this"). A few other categories have nonzero word counts.



Age and Popularity

The charts below show the correlations between use of words from a given word category and age/friend count of the user. The word categories are ordered by correlation, with word categories most positively correlated with age/friend count at the top, and word categories most negatively correlated with age/friend count at the bottom [2]. (Click here for a higher resolution image.)


The chart on left confirms the typical stereotypes about younger and older people. Younger people express more negative emotions (including anger) and swear more. They use more pronouns referring to oneself ("I", "my", etc.) and talk more about school. Older people write longer updates, use more prepositions and articles, and talk more about other people, including their family.

Word usage of more "popular" people also differs from people with a lower friend count. People with more friends tend to use more of the pronoun "you" and other second person pronouns. They write longer updates, and use more words referring to music and sports. More "popular" people also talk less about their families, are less emotional overall, use fewer past tense and present tense verbs and words related to time.

Timing is Everything

It should be no surprise that people write about different topics at different times of the day. In the plot below we show how much more (or less) people use words from each given word category compared to the overall average. Generally, people tend to talk about what they are (or should be) doing at a particular time of day. For example, words about sleep increase at night and peak in the early mornings, when people should actually be sleeping. Words about occupation and school are increased in the mornings (perhaps while we're on our way to work/school). Words about social processes and leisure are low during the mornings (when people are either in school or working), but they increase as the day goes on.


Interestingly, the emotional content of status updates also varies depending on the time of the day. Positive emotional word use is higher in the mornings, when the corresponding usage of negative emotional words is low. Negative word use increases as the day goes on, as positivity decreases.


What Our Friends Like

Once we write a status update, it is no longer just about us; it's about our friends too, and how they interact with our updates. How do our friends react to different status updates? To answer this question, we look at the correlation between percentage of an update's words that fall into each word category, and the number of likes and comments the status update receives. Here are the word categories, ordered by correlation. Again, categories positively correlated with likes/comments are at the top, and those negatively correlated with likes/comments are at the bottom. (Click here for a higher resolution image.)


Unsurprisingly, status updates with more positive emotional words receive more likes, and those with more negative emotional words receive less likes. Slightly less intuitive is the fact that positive emotional updates receive fewer comments (perhaps there's nothing more to say) whereas negative emotional updates receive more comments (perhaps as a consolation).

People also prefer to like a religious comment rather than commenting on it (perhaps it's not a topic they wish to commit to). Status updates that use more pronouns receive more of both types of feedback, as do longer status updates. As for the one word category that correlates most negatively with both likes and comments? Sleep. [4]

"Birds of a Feather Flock Together"

The word "homophily" literally means "love of the same". It is the idea that people tend to associate with others similar to them. Homophily was apparent in one part of our analysis, where we looked at the correlation between how much a user uses certain words in his status updates, and how much similar words are used in his friends' updates shown on his feeds. The correlation plot below shows a clear diagonal line, meaning there is a positive correlation between how much you use words from a word group, and how much your friends do. [5]


Here's a version of the correlation plot with all the word groups. Not all word groups are labeled, but the diagonal line is much more visible.


Notes

[1]http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/reprints/LIWC2001.pdf (pages 17-18) contains a list of word categories.
[2] Partial correlations are used to account for the effects due to time of year, friend count, and age where appropriate.
[3] "Hour of day" is in Eastern time, although the sample includes US English speakers in other time zones. Corrections for daylight saving time were in place.
[4] Yes, we did double check that the correlation between feedback and sleep words is negative even after taking into account the fact that people tend to post sleep-related status updates when their friends are, well, sleeping, and not on Facebook.
[5] Here, we did not account for effects of age and other demographic similarities.


Lisa Zhang, a data science intern at Facebook, wishes everyone a happy holiday season.

沒有留言:

張貼留言