Andreas Weigend | Social Data Revolution | Fall 2013
School of Information | University of California at Berkeley | INFO 290A-03

Class2: November 5, 2013

Responsible for page setup:
Irina Lozhkina
Brian Bloomer
Rohan Salantry

Class materials:
Beyond Big Data: How personal data refineries change big decisions by Andreas Weigend

1. Objective of the Social Data Revolution class

In the fast changing world, dialog is how we learn things and moving forward. Teamwork and sharing knowledge is the way to be more transparent and learn how people think, act and behave. If at the University, undergraduates learn how to answer questions, graduates learn how to ask questions. Thus, asking good questions is the objective of the class.

In addition, limited abilities of the University do not allow to perform informative and reliable researches. Usually only companies like Google, Facebook or Amazon have all the data and, hence, all the capabilities to analyze data. Then what’s left? Asking good questions. At least we know how to ask good questions. Formulation of what we are thinking is one of the goals for this class.

2. From the Golden Age of Algorithms to the Golden Age of Data

Transporting bits was at the core of the communication revolution. The golden age of algorithms occurred at a time when data was very scarce, so the greatest amount of value was found in the design of algorithms which could wring as much knowledge as possible from available data. Today, the amount of data available to us is growing exponentially, doubling approximately every 1.5 years. At the core of the social data revolution is the creation of new bits. The social data revolution is not about having more dead data. We now have live data, and we can change the conditions under which data is generated to produce new/different data. We can bring the scientific method to areas that were previously inaccessible to it due to a lack of data -- particularly social data.

3. Making the implicit explicit

The main goal of analyzing data is to make implicit things that are present in data explicit. We use mathematically rigorous techniques such as Machine Learning where we are optimizing cost functions and/or finding patterns in datasets. An example of making implicit data explicit is to measure customer delight. How do we measure if customers are delighted and if they are , how delighted are they ? There are a lot of benefits associated with recognizing and interpreting implicit data. Companies such as apple that pride themselves of understanding what their customer needs have methods that rely on both implicit and explicit data. The advantages are many fold. Companies can also benefit from getting rid of higher economic cost associated with getting customers to provide explicit data, compared to collecting implicit data (like just tracking their geolocation data -- they don’t have to do anything except carry their phone).

Uncle watch : a Chinese politician was found to wear expensive watches that were different. The images on the web of the person wearing different watches ended up with him in prison. The watch although an insignificant piece of information was crucial.

4. Power of Explicit Social Data

Some types of explicit social data, such as meaningful content created by an individual, can have very powerful consequences. This was demonstrated in the case of Geoffrey F. Miller and a public tweet he created that criticized obese people. Because of the content in his tweet, he was censured by the University of New Mexico, his reputation was damaged, and his public persona was permanently altered.

5. Observing behavior changes behavior

Behavior is based on the beliefs. There are different layers of beliefs:

1) Individuals

2) Connections/Pairs

3) Groups

Social data allows people to be more connected. Thus, often times, our behavior depends on how others collaborate. We are putting ourselves in their shoes and act accordingly. Thus, our beliefs change.


1) Nextdoor App: It is a private social network for your neighborhood. If there is a criminal activity, neighbors can share crime related information between each other. It helps to identify a criminal and predict his actions. The question: knowing that neighbors are using Nextdoor App, HOW would it change the behavior of the criminal?

2) Google Maps: Google Maps provide the most efficient ways to reach a destination. Sometimes, it can give different routes to go from point A to point B. Individual decision depends on the other data that people create and share in real time about traffic and other criteria. The question: Knowing the way Google Maps works, how can it change behavior of the people on the road? How can someone use it to his/her advantage?

3) Looking at Facebook profiles: The question: How would people change their behavior, if Facebook suddenly changes the policy and people can see who went to their page? How much are people willing to pay to see who viewed their profile?

6. Data Literacy

The decision making process in the wake of data science has changed. The decision making starts with focus on the decision rather than the data. The data is then evaluated in the context of the decision to be made. Also, understanding the data - what makes sense is important. There are correlations that are present but does not make sense. The essential distinction is between correlations being significant and relevant which is when the decision drives the data selection. A lot of things are significant but a few things are relevant. We don’t always trust statistics and statistical significance and we use common sense to eliminate irrelevant findings. An NYU prof giving a lecture about how he could statistically prove that coffee bean futures were impacting the price of the Yen.

7. Data Symmetry

There are many questions to explore surrounding the idea of data symmetry. How do we achieve a symmetry of knowledge between users and organizations regarding the generation and use of data? What mechanisms can be created so that individuals can understand how companies arrive at their data-based decisions? What mechanisms can be created so that individuals & society can inform and correct companies that use inaccurate data to arrive at decisions with real-world consequences?