Andreas Weigend | Social Data Revolution | Fall 2013
School of Information | University of California at Berkeley | INFO 290A-03

Class1: October 29, 2013

Responsible for page setup:
Belle Peng
Shubham Goel
Jinze Gu
Priya Iyer

Class materials:
Part 1: 1029_154135.WAV(1.44 GB, 01h 30m 51s)
Part 2: 1029_173725.WAV (910.5 MB, 57m 21s)

The Social Data Revolution

The digital bread crumbs we leave behind have an impact on the way we lead our lives ahead. Eg. A developer gets hired by a company just by the way he answered a pertinent question on Quora. These bread crumbs that we leave are changing the way we work too. Eg. The person responsible for the transcripts of this class is from Israel and works virtually using oDesk. This is very different from a traditional model when a person sits in an office and is available as per your convenience. These changes are becoming more and more prominent as we are able to collect and analyze data and are transforming the future of working environments.

The previous century saw innumerable advances in the world of physical sciences, and that is attributable to the advances in technologies of observations of the physical processes. Eg. X-rays let us see through the body, Magnetic Resonance Imaging (MRI) lets us visualize the brain's activity for one, a high energy collider allows us to study the interaction between sub-atomic particles to discover the operating principles of the universe. There are technologies that make it possible to predict decisions one may make months from now. In this class we ware going to look at data that users create in the many ways and the digital bread crumbs they leave behind. This kind of data is called implicit data.

In the following paragraphs, we will talk about some instances of how data can transform real life experiences that have been the same for years.

Picture this! In 2020, 7 years from now, all of us would be wearing smart devices like glasses and watches that would tell us about the emotional state of a person. This could transform a class room environment which has been almost the same for the last thousand years. Such futuristic devices would make it possible for the instructor to gauge how much of the course content is being understood by the class and based on that data he/she could reorganize the class or take other actions to improve the class experience.

Over the decades, commerce has been tremendously transformed in terms of the roles and actions taken by the vendors. Commerce has transformed from a shop keeper who used to keep track of his inventory manually and recognized his customers by name, to the experience in a grocery store with automatic checkout mechanisms and technology keeping track of inventories, to the commerce where every click or search performed by the user is used to make better recommendations and sell similar products to the user.

Technology is now giving us a huge amount of data about people and the possibility of observing their social interactions that seems like an opportunity for breakthroughs in social sciences similar to what happened in the last century with the breakthroughs physical sciences. This phenomenon is what we call the Social Data Revolution.

Social Data in this context means the data shared by people, it can be:
  • Explicit: This is the kind of data that is provided by answering explicit questions, For example, the data provided when we answer a questionnaire, we make a review of a product or rate a comment.
  • Implicit: This is the kind of data that is collected without answering an explicit question, by gathering the information naturally required for some activity. For instance, the searches and clicks we necessarily have to go through and the data collected thereof during a purchase process on Amazon.
  • Social: This is the kind of data that we share voluntarily. For instance, data from social networking sites like Twitter or Facebook.
  • Contextual: This is the kind of data that comes from where you are. For instance, the sounds, images or the localization of your position,any information that a GPS enabled device can easily gather.

Mobile devices are two way devices and is one of the devices that although we do not consider obtrusive as a Google Glass or street view cars, they still include a camera, a microphone and many other sensors that can be used to monitor a lot more data than most people think. Becoming aware of these possibilities could lead to a big change in our social behavior. Knowing that this data might be available to companies and other individuals would further modify our habits. This brings up hard questions related to the misinterpretation of data.

<Remainder of the first half of the class>
Need to provide with the ways to fix the data. We should also think about how to handle mistakes in the data? What if the data is correct but the interpretation is wrong? What should the independent variables be while computing dependent variables in order to build better models?
SoLoMo: Social Local Mobile is where the social life of data really happens!
What is the Social Life of Data? Its the data we share on social networks or contextual data about our locations and activities, its the social graph and how people are connected. Using this data to recommend friends and improve connections. We can control the social constructs of ourselves. Users control what they can share on social media. Examples of technologies using data to inform decisions: PredPol, SFpark etc.

Framework to think about Social Life of Data
What you want to optimize for is the sum of different terms, thereby forming an equation. There are certain decay constants over time involved in the equation as well. There are always trade-offs on both the sides. The decision you make is a combination of balancing the trade-offs on both the sides. This has allowed anybody to experiment with data to understand how to balance these trade-offs. For instance, Andreas worked on understanding how to optimize customer satisfaction in Amazon. What annoys the customer? What makes the customer happy? What makes the customer buy more products?

<Beginning of second half of the class>
Second Half of the Class - questions we are interested in finding out:
Can we always put a value to data?
  1. Economics of Data is important. It's not so much about the validity of data, but about what you want to do with it. You can spend all your life validating, and you'll be dead before you're done. Just think about what you want to do with the data, and focus on making sure those are good to answer the question you want to answer.
  2. Can you put a price to things? Or are there human rights objects that we are hesitant to put a price to it. Secondly, do we understand what the deal is?
  3. Trade Off - what does one do with data with insurance and pricing? For insurance company and people that are affected. There are laws in place to manage credit data, but that's easier to regulate than personal data. Second question - where is the boundary on sharing data, similar to privacy?
    • What is the social role of secrets?
    • Why do you share? What do you share? What do you not share?
  4. How reliable is data / data quality?
    • Story: a friend that changes his birthday everyday so people wish him happy birthday everyday. Facebook gave him a warning message that he can only change his birthday once more. Facebook wants more reliable data so it's more valuable.
    • What if LinkedIn allows you to see the changes over time?
    • Interesting to see how people change attributes of data, for example - 29 forever! There are more people 29 than 30. What does it do to the social eco system?
  5. AirBNB more rooms offered than in Mariott. People staying there are vulnerable as well as people offering their homes. Story: friend that stayed in AirBNB in SF called and said he's very uncomfortable. The importance of background check. Even having a mutual friend makes a huge difference.
    • Challenge: how do we feel about a webcam in the room that's being rented out? As a renter, I want to know my room is not damaged. As a rentee, I wouldn't be comfortable changing in the room.
  6. Mobility Space: you can park at the airport for free if you rent out your car while you're away. It's interesting and beneficial for car owners and renters.
  7. Expectations for the class:
    • Reading
    • Current stuff
    • What are 3 things that you think I don't know about but should know
    • Class participation
  8. How to enjoy the amount of data and accessibility of social data without falling victim to crime and abuse?
    • Trade-off: Matrix of good things, bad things, expected things (i.e. first moment - mean), and unexpected things (i.e. variance). Insurance is to avoid unexpected bad things. The other side of the bay is looking for unexpected good things. Critically look at the social data revolution and make your own decisions.

<2nd half, 30:27>

Chatter <Ignore and delete later:>
Shubham: I have added the audio recording from the class and am adding notes for the first 50 mins of the class. I propose some of us take the responsibility of proof reading and the rest making notes for the remaining 1hr 40mins of the class. What do you guys think?
Wandy: Anyone feel free to revise what I've written... I'm gonna listen again and try to fill the blanks. As well as add links and media examples. Great job, Shubham!
Priya: Thanks guys! I proof-read the first half written by Shubham and Wandy. Also, I added the remaining 40 minutes of the first half of the lecture.
Belle: Sorry guys I just got access to this wiki page because I was not registered for the course. I can split the second half of the lecture with Jinze. I will take the first 30minutes, and Jinze can add in notes for the remaining 27 minutes. Will do this today.
Belle: I added the second half of the lecture up until 30:27.