Introduction

This section talks about the Big Data's 4 V challenges. The 4 Vs are Volume, Variety, Velocity and Veracity.

The 4 Vs challenges

The 4 Vs are:

  • Volume - talk about the data size.
  • Variety - talk about the data formats.
  • Velocity - talk about the data streaming speed.
  • Veracity - talk about how much I can trust my data, data trustworthness.

Volume - Data Size

When look at the Volume, it is talking about the data size, and for big data we expect it to be big. But we may not know exactly how big of a data set we're looking at, but the following numbers may give you a more accurate perspective.

  • 40 Zettabytes (1021) of data is predicted to be created by 2020.
  • 2.5 Quintillionbytes (1018) of data are created every day.
  • 6 Billion (109) people have mobile phones.
  • 100 Terabypes (1012) of data is stored by most U.S. companies.
  • 966 Petabytes (1015) was the approximate storage size of the American manufacturing industry in 2009.

Variety - Data Formats

The Variety is about the Data Formats. Here are some numbers may give you a more accurate perspective.

  • 150 Exabytes (1018) was the estimated size of data for health care throughout the world in 2011.
  • More than 4 Billion (109) hours each month are used in watching YouTube.
  • 30 Billion contents are exchanged every month on Facebook.
  • 200 Million monthly active user exchange 400 Million tweets every day.

Velocity - Data Streaming Speeds

The Velocity is about Data Streaming Speeds. Here are some numbers may give you a more accurate perspective.

  • 1 Terabypes (1012) of trade information is exchanged during every trading session at the New York Stock Exchange.
  • 100 sensors (approximately) are installed in modern cars to monitor fuel level, tire pressure, etc.
  • 18.9 Billion network connections are predicted to exist by 2016.

Veracity - Data Trustworthness

Let's look at the Veracity:

  • 1 out of 3 business leaders have experienced trust issues with their data when trying to make a business decision.
  • $ 3.1 Trillion (1012) a year is estimated to be wasted in the U.S. economy due to poor data quality.

References & Resources

  • N/A