Andreas Weigend | Social Data Revolution | Fall 2013
School of Information | University of California at Berkeley | INFO 290A-03

Class3: November 19, 2013


Responsible for Page Setup:

Alex Battaglino abattaglino@berkeley.edu
Renu Bora renubora@ischool.berkeley.edu
Yunah Ko yunah1227@berkeley.edu
Renata Sanchez de Lollano renatasdelollano@gmail.com


INTRODUCTION


What is the future of communication, of computation and commuting? What do computing and commuting have in common? These questions were all raised in presence of Paul O'Shaughnessy an expert that also believes in the importance of knowing how to ask good questions. As a former manager of AT&T’s 7 billion messaging business, he learned from experience that asking the wrong questions can lead to less valuable answers: AT&T never thought of developers as customers and never asked them what they were expecting of the service they were in charge of at the time. As a result, AT&T ignored who its real customers were: “We had 300 million subscribers at the time and none of them were the subscribers of what we were offering”.


Paul_SDR.jpg


Paul O'Shaughnessy shared his valuable insights in our Social Revolution class, which covered:

1. Summary of Class 2 of Social Data Revolution
2. Complex and Complicated Systems
3. The Connected World: What Does It Mean to be Connected for People and For Devices?
4. The Mechanisms and Media of Data Recording Create New Decisions and Trade-Offs
5. Context is Increasingly Relevant for Information Issues Such as Privacy, Meaning, and Value
6. How do We Measure Value and Analyze Risk?
7. The Internet of Things
8. Challenges in Understanding Security, Privacy, and Trust
9. Discussion:the past and future of social data: computing, connectivity and intelligence.


For further information, the videoof Social Data Revolution is available on Youtube.


1. SUMMARY OF CLASS 2

  • Learning to get good questions is more important that just getting answers.

In the words of Paul.O., “during my MBA, one professor used to tell us that in undergraduate you learn to answer questions, in graduate school you learn to make good questions.” In fact, Paul.O. explained how AT&T failed to target its true customers because it never asked developers what they were expecting of AT&T’s service; developers turned out to be the true customers of the service they were offering. On the contrary, when Amazon faced a similar situation, this company wondered: “what can we offer to developers so that they can be successful?”.
In the business world, looking to the future and wondering how the landscape will be in ten years from now is extremely important in order to plan accordingly. Failing to ask this type of questions and thinking that a business which has been dominated by a company will stay the same in the future is not the right approach. In the word of Paul.O.“ In my team I liked to have growers (in favor of growing the business) and cannibals (in favor of replacing it by something different). If you are not able to eat you own lunch, somebody else will. In fact, we do not buy messages from the main carriers anymore, they come with our phones.” Foreseeing these future scenarios is essential for a company to design and implement the right strategy.
  • From the Golden Age of Algorithms to the Golden Age of Data.

Transportation of energy led to the Industrial Revolution, and transportation of bits led to the information revolution. Whereas the first information revolution which occurred in the 90s focused on the access of data (people could access data from anywhere), the second revolution of information focuses on the creation of data and metadata.

    • Paul.O. illustrated this point with a system that UPS is using to monitor their trucks with vibration and temperature sensors, so that when they reach certain levels it indicates that the equipment needs to be changed. For UPS, the data is not the bottom line but rather the analysis that goes on top of it: the company does not care about causality, it just needs to know that a truck is ready to fail to replace it so as to avoid having to redirect the packages.

    • Other examples are related to biometrics: voiceprints that are able to recognize a speaker by its voice; iris prints (which are under consideration for implementation in the UK); “buttprints”, which could be used as a useful piece of information for a car to move ahead; sensors that can tell how many heartbeats are inside a container that could be useful for human smuggling for example. Companies are also studying issues related to password typing (and the fingers we use to type them in), being able to tell whether users have carried their phone for the last hour.

However, data creation or generation raises an important issue: we are generating data that has the potential to be used for good purposes or purposes about which people do not mind, but it can also be used for bad purposes.
    • A main concern is the fact that sometimes individuals are not even aware that they are actually creating data: most people are unaware of the fact that the new iphone includes a micro bluetooth that cannot be turned off. This allows any potential user of this data to track people’s location even though they do not have made the decision of turning their GPS on.
    • An example of the use of social data for good purposes is the current ubiquity of wi-fi connection in a building, so that in an emergency situation it is possible to locate a specific person within a about 25ft. in the building.

Why have algorithms become less relevant nowadays in comparison with data creation? Algorithms are meant to account for variable sample sizes, and they come from an era where we could not test a complete population. Nowadays, because we can collect data more accurately and store it much more cheaply than before, sample sizes have become much bigger: the algorithm is still important but it does not have to account for a whole set of things because the sample size has increased dramatically.

With the transition from the golden era of algorithms to the golden era of data, there has also been a transition from the statistical method of sampling of the algorithms era to a continuous sampling that corresponds to the era of data. Nowadays, sampling has somehow disappeared since everyone can be considered a sample 24/7 (we are constantly creating data). Going back to the aforementioned example of the trucks, whereas before UPS would have drawn conclusions on ten trucks, nowadays the costs of instrumentation and data storage have decreased and have facilitated continuous sampling.

If we are able to get data from such big samples, why do we still collect data through surveys for example? The answer to this question is related to social acceptance. Even though using sensors or other tracking methods is cheaper, social acceptance of this methods of data collection is lower, and social acceptance seems to play an important role.
  • Making the implicit explicit:

There are two main ways of achieving this purpose:
    • Writing down an equation
    • Finding patterns through the analysis of data (in the example of the truck, if the temperature of the engine reaches a point, it is likely that the car will break).

  • Observing behavior changes behavior

We addressed some questions: when we see that someone has seen our message on facebook chat, does that change our behavior? If someone knew all the pictures we look at on facebook, would that change our behavior?
Back to Top

2. COMPLEX AND COMPLICATED COMMUNICATION SYSTEMS
In the 80’s was in charge of one of the existing power plants in the U.S. Back then, communication was very complex and complicated.
  • It was complex because it consisted of many parts that mostly did not interact with each other.
  • It was a complicated system because it entailed lots of interactions between people or entities.

This experience taught Paul.O. how to deal with very large and complicated systems. They were responsible of knowing everything about the nuclear power plant and how it worked as well as all the systems apart from the one they worked on. He learned to think about things in “large ways”, in the the way things act together. This addresses for example, what happens to the rest of the system if you push one of the pieces that are part of such a great whole?

It turns out that changing one of the pieces in a complex system can have the power to change the overall system. For example, the data creation process of emails and letters is very similar, but a change in the delivery time constraint changed the entire communications system.The time needed to communicate by letter decreased tremendously with the appearance of the email. Before, the time people spent writing the letter was much shorter compared the delivery time). In contrast, the time needed to write an email is much shorter than the one needed to deliver it. Because receiving emails takes much less time, we are able to receive much more information every day that is easy to share.

The example of email and letter also illustrates how the transition is also determined to a decrease in the communication costs. Even though the cost of sending a letter is not very significant, the cost of putting the letter together (stamp, time to take it to the post office), makes individuals evaluate whether it is really worth going through the trouble. In this sense, the letter vs. email example also relates to the transition to digital photography: nowadays we take more pictures of things because there is no cost of deleting them.

It is therefore important to ask ourselves: “what are the systematic effects?”, “What are the big trends that we need to make sure that we are looking at as we start to make some of these technological changes”. According to Paul O., “ When you start to think systematically, it is interesting to see when you introduce friction in one system what happens to another part of the system."

Back to Top




3. THE CONNECTED WORLD: WHAT DOES IT MEAN TO BE CONNECTED FOR PEOPLE AND DEVICES?


As people and devices are increasingly connected to each other and sharing data, we are exposed to new kinds of decisions, trade-offs, risks, and questions about privacy, trust, and security. As the meanings of these terms change, there arise many levels of understanding of these terms. Even experts who understand or even model the terms must make difficult decisions and predictions in their changing landscape, interdependencies, and in response to the public's own perceptions of the meanings.

Back to Top


4. THE MECHANISMS AND MEDIA OF DATA RECORDING CREATE NEW DECISIONS AND TRADE-OFFS



The Time and Money Cost of Data and Information Use
With economics of time and money (letter-> time taken, photography->film), the dominant terms have changed. Now for photos, the time taken to sort and select the photos dominates over the old expense of actually taking the photos. Caterina Fake from Flickr says we’ve moved from event-based photography to ambient-based photography. (From child-births, to almost anything in our daily lives.)
Disadvantages to Data Capture

  • People can use all the data harvested to find and connect much personal information about us now, and use it for questionable goals (Japanese internments in WW II).
  • We are using recording devices and cameras, sometimes capturing experiences rather than experiencing them. An “invisible hand” has us connected and sharing 24/7 instead of 9-5. Some of us turn our phones off periodically over the weekend.

Smart Data Capture in the Future?
We have a choice to turn devices on or off. When will devices intelligently turn themselves on or off?

Back to Top


5. CONTEXT IS INCREASINGLY RELEVANT FOR INFORMATION ISSUES SUCH AS PRIVACY, MEANING AND VALUE



What We Want, When We Want It
The who, the how, and the what of our information, is all about context. The time and place can make junk mail with an ad can become useful in a store.


  • Info is increasingly being delivered to us with context-sensitivity: giving us what we want, when we want it.
  • Info is even predicting us, giving us things we want-- before we even know we want it!


Chilling Example of Unintentional Privacy Breach
An ad for diapers in a Target shown to girl shopping with her father. He didn’t know she was pregnant. Targeting and data collection and use may go against privacy priorities. We don’t trust or know what happens to our data.
How Does One Control Data Use? (Data itself is neither good nor evil)

  • PRISM is useful for catching terrorists, but we have a 6,000 year history of governments abusing power.
  • Intel wants “neutral” technology, to benefit customers, and for brand needs to maintain trust.with customers (Governments also need trust). Each chip has it’s own unique ID, as do MAC addresses. Usually the chip ID is not tied to other data. For the internet of things, we need to know that someone hasn’t replaced a device with the device that’s “bad actor.”


Overdependence on Data Systems?
Someone who sold big shoes on the internet was thriving, but then Google changed its search ranking, and they lost public visibility, and as their business declined they were stuck with a warehouse of unsold shoes. How can one depend upon changing circumstances?


Is Decision-Support Possible with Low Trust?
How can we make rational decisions about our data use and generation when we are lied to by companies and governments about what happens to the data? Data is shared between anti-terrorist and other agencies. Economics get jumbled.

Back to Top


6. HOW DO WE MEASURE AND ANALYZE RISK?


There are different ways!

The Power of 1% and Things that Spin

GE has a great white paper on the power of 1%. They have a category just for things that spin. $34 billion could be saved annually by airlines if all things that spin improved 1% in efficiency.

The Mean or the Tail?

How does one value catching a terrorist, or negatively value wrongfully accusing or prosecuting an innocent?

  • One can focus on either the mean of the distribution or else the tail.
  • Sampling with a focus on the mean is great for a customer’s average user experience, but not for say, terrorism, where one doesn’t want an individual to slide through…(either a catastrophic incident, or someone wrongfully punished).

Risk Analysis Tools (Charts with Two Axes)

  • One risk analysis tool uses two dimensions, with the likeliness of occurrence vs. the consequence of occurrence (each rated on a scale from 1-5). A 5x5 matrix, with two dimensions.
    risk-table.jpg
  • In the first class, we made a similar graph with quadrants: Unexpected things vs expected things (Likelihood) as one dimension, and good vs. bad things (Impact) as the other dimension. This is 2x2 (binaries) instead of 5x5 (quintiles), but otherwise similar.
    risk-quadrants.jpg
















Back to Top

7. THE INTERNET OF THINGS


The internet of things has been around 10 years. What it is and does varies.

Jeff Bezos and What Will Change vs. What Will be the Same

For industrial companies, when someone asked Jeff Bezos what will be different in 10 years, he says “I don’t care-- I want to know what will be the same so I can invest and make a business that will last.” What are the macro-trends that will be the same? Bezos: “I can guarantee customers won’t want more price, slower delivery, or less variety.” Identify what will be same, and solve that problem. Things will continue to move to the cloud. There is no reason to buy standing capacity (Citrix has no plans for a data center!).

Cloud Computing (Elastic Computing) vs. The Edge

  1. Elastic computing (storage, cycle time) will move increasingly into the cloud.
  2. Local computing resides at “The Edge.” For example, one’s car and commuting. The newest Ford generates 25 gigs per hour of data. These will not move fully to the cloud, because that amount of data can’t be moved across the network in a real-time usable way. To drive safely, cars need 2-3 millisecond responses. It will need many sensors, talk to the road, signs, and other cars. Intelligence has to move to the edge. It will want to talk with many endpoints.

Security, Authenticity, and Integrity
Security, authentication, and integrity of these sensors/devices will be critical. GE ovens connect to internet. Hackers could have them self-clean and bring down grid. So security and authorizations are needed. Security, authentication, and integrity of sensors and chips will be needed. A new group at Intel has been formed for this. Hardware chip-level security is critical- a burned chip that can’t be spoofed.


Back to Top


8. CHALLENGES IN UNDERSTANDING SECURITY, PRIVACY AND TRUST


What does it mean that we have some notion of security? There are many levels to security, and therefore, many explanations of security.

How Can One Trust Things?

  • ATM’s- we don’t understand them, so some of us trust them, some don’t.
  • With the internet, even more than ATM’s, it is typically AFTER one uses a site or app such as Facebook that one starts wondering how secure and private it is, and what the implications are.

Explaining Complicated Security and Technology Practices and Degrees is Very Hard


  • People don’t have a guidepost, or reference point of “normal,” “secure,” or “very secure.”
  • With Facebook, at least there is a user interface to help imagine security breaches-- someone seeing things they shouldn’t see.
  • Hacking an autonomous vehicle is much harder for people to visualize.

Hope in Trade-off Models?

There is in theory an equation for all of our priorities, with trade-offs as we try to balance priorities.

  • Ease of access is just one priority
  • Degree of security is another priority
  • Notice the difference between a Skype vs. Bank password. There are trade-offs to manage.
  • For retinal patterns and biometrics, one doesn’t need to remember passwords, but the trade-off is that one can be tracked easily, since our bio-data is in databases.

The Economics of Trust

Other countries may be wary of US companies misusing private data for business advantages (illicit competitive intelligence!). Rental car tracking can be used for many types of surveillance and business purposes.

Pressures of the Internet of Things in Context

When one presses on one place, where does it bulge somewhere else?. With the internet of things: we are going to find more frictions emerging, perhaps in unexpected places in the overall systems.


Back to Top


9. THE PAST AND FUTURE OF COMPUTING, CONNECTIVITY AND INTELLIGENCE
What is computing, connectivity and intelligence at each of the points of the timeline shown below?
-30 years ← -10 years ← -3 years ←PRESENT→ +3 years → +10 years → +30 years

Present


In 2013, computers are in a state of transition from static to mobile. For the first time, one of the computer’s main functions is to maintain and preserve connectedness which people rely on. For this reason, computers have become ubiquitous in our lives. Our entertainment, work, relationships, and personal lives depend at least somewhat on computing.
Intelligence has started to shift from answer-based to question-based. With Wikipedia at one’s fingertips, memorizing a vast array of facts is no longer useful. Intelligent people are those that can ask to right questions and visualize the right problems that, when solved, bring about substantive change.

3 Years Ago


The connectedness we are currently experiencing was in its infancy. Smartphones were not yet the norm, nor were our different devices connected with one another. Connectivity still had a significant human element in terms of maintenance. Our devices all performed different functions, whereas now they have huge overlaps in functionality.

3 Years From Now


The transition to computational ubiquity will be nearly complete. This will force us to rethink certain commonalities in our lives. The old model of work, where employees are hired by a company and kept on salary for many years will start to become obsolete. Through new levels of connectedness, those in need of work will be able to find workers in real time. The identity of those providing work can be verified to a previously unreachable level of depth due to the digital breadcrumbs we all leave.
10 Years From Now

The concept of everything will continuously change as it changed until today. New definition of intelligence will keep being established. Computers would be smarter than any other experts with its huge amount of information, but still people would find their own intelligence based on them in 'creativity'. The role of Imagination and creativity, which bring out solutions that do not follow the passage that already exists, will become more and more important. Unlike the past, non-linear way of thinking will be more necessary as it would lead to innovative determinations that computer cannot make according to simply calculating data.

30 Years Ago


Internet was not available for normal people and there was no distribution of digital devices. Technology did not have as much as impact on the society, as computers were first made to simply make difficult calculations instead of human.

30 Years From Now


30 years from now, there will be not only technological growth, but also change of people's way of thinking, which can bring the whole systematical change of the framework of the society. Paul.O. gave an example of 3D printer, which prints out the product that people have designed. When 3D printer is distributed, the stores would change into a factory of printers and people will not buy products but designs. This invention will shift the whole marketing system and purchasing process. As this example shows, new possibilities will be keep suggested, which would change the original paradigm of the society.


From the students’ perspective, what did Prof. Weigend and the guest speaker miss in the lecture?
The lecture could have focused more on the impact of technology in relation to the society. We can think about if the society is making changes in a right way and if people are really wanting this technological growth. If we are changing emotionally and our way of thinking is also shifting according to it, how can we measure those changes in order to check if this rapid change within 30 years have been ultimately a positive change?


Back to Top