Three Challenges

  • Volume - the size of the data.
  • Velocity - the latency of data processing relative to the growing demand for interactivity
  • Variety - the diversity of sources, formats, quality, structures. Integrate

Big Data is any data that is expensive to manage and hard to extract value from.

Michael Franklin, Professor of Computer Science in University of Berkeley.

In Big data, the Big is relative, it is not necessary to be big. Sometimes, difficult data is perhaps what big data really means. It is not so much really big, it is about being challenging.

Big Data History

The earliest notion - 1989 :

The keepers of big data say they do it for the consumer's benefit. But data have a way of being used for purposes other than originally intended.

Erik Larson, 1989, Haper's magazine

E-commerce, in particular, has exploded data management challenges along three dimensions: volumes, velocity and variety.

On Volume

The lower cost of e-channels enables and enterprise to offer its goods or services to more individuals or trading partners, and up to 10x the quantity of data about an individual transaction may be collected - thereby increasing the overall volume of data to be managed.

On Velocity:

E-commerce has also increased point-of-interaction (POI) speed, and consequently the pace data used to support interactions and generated by interactions.

On Variety:

Through 2003/4, no greater barrier to effective data management will exist than the variety of incompatible data formats, non-aligned data structures, and inconsistent data semantics.

Big Data ... and the Next Wave of InfraStress

Jhon R. Mashey, Former Chief Scientist, SGI

Disk capacities growing incredibly fast, disk latencies not keeping pace.

Big Data Now

... the necessity of grappling with Big Data, and the desirability of unlocking the information hidden within it, is now a key theme in all the sciences - arguably the key scientific theme of our times.

Francis X. Diebold, Paul F. and Warren S. Miller Professor of Economics, University of Pennsylvania

Where does big data come from?

  • "Data exhaust" from customers - actually tracking a lot information about what customers do.
  • New and pervasive sensors - the availability of new and pervasive sensors. We are actually able to get visibility on data sources that we previously couldn't
  • The ability to "keep everything" - The capacities of disk has gone up and the cost of per storing a byte has gone down, and we sort of have ability to keep everything.


Car black boxes

Data Science, big data

More and more new cars will be equipped with these black boxes that are a lot like what's going on inside airliners have. The reason is for forensics in the event of a crash, but they also record a lot of other information. Ans so insurance companies have similar devices that you can opt in, you can voluntarily plug in to reduce your insurance rates. That track your speed, track other kinds of aspects of your driving habits. So this technology would have been simply hard to imagine in 20 or 30 years ago.


Data Science, big data

HydroSense is a pressure-based sensor that automatically determines water usages activity and flow down to the source (e.g., diskwasher, laundry, shower) from a single non-intrusive installation point.


Data Science, big data

ElectriSense is a single plug-in sensor that provides whole home device level usage data. That is, using a single sensor plugged in anywhere in the home, ElectriSense can infer which electrical appliances are on and which off. This data could be used for numerous applications, for example, for providing home owners with itemized electrical bill that not only shows the total energy consumption but breaks the total on a per appliance basis (TV consumed 20 KWh, Lighting consumes 18 KWh and so on).