Big Data's 4 Vs challenges
Introduction
This section talks about the Big Data's 4 V challenges. The 4 Vs are Volume, Variety, Velocity and Veracity.
The 4 Vs challenges
The 4 Vs are:
- Volume - talk about the data size.
- Variety - talk about the data formats.
- Velocity - talk about the data streaming speed.
- Veracity - talk about how much I can trust my data, data trustworthness.
Volume - Data Size
When look at the Volume, it is talking about the data size, and for big data we expect it to be big. But we may not know exactly how big of a data set we're looking at, but the following numbers may give you a more accurate perspective.
- 40 Zettabytes (1021) of data is predicted to be created by 2020.
- 2.5 Quintillionbytes (1018) of data are created every day.
- 6 Billion (109) people have mobile phones.
- 100 Terabypes (1012) of data is stored by most U.S. companies.
- 966 Petabytes (1015) was the approximate storage size of the American manufacturing industry in 2009.
Variety - Data Formats
The Variety is about the Data Formats. Here are some numbers may give you a more accurate perspective.
- 150 Exabytes (1018) was the estimated size of data for health care throughout the world in 2011.
- More than 4 Billion (109) hours each month are used in watching YouTube.
- 30 Billion contents are exchanged every month on Facebook.
- 200 Million monthly active user exchange 400 Million tweets every day.
Velocity - Data Streaming Speeds
The Velocity is about Data Streaming Speeds. Here are some numbers may give you a more accurate perspective.
- 1 Terabypes (1012) of trade information is exchanged during every trading session at the New York Stock Exchange.
- 100 sensors (approximately) are installed in modern cars to monitor fuel level, tire pressure, etc.
- 18.9 Billion network connections are predicted to exist by 2016.
Veracity - Data Trustworthness
Let's look at the Veracity:
- 1 out of 3 business leaders have experienced trust issues with their data when trying to make a business decision.
- $ 3.1 Trillion (1012) a year is estimated to be wasted in the U.S. economy due to poor data quality.
References & Resources
- N/A
Latest Post
- Dependency injection
- Directives and Pipes
- Data binding
- HTTP Get vs. Post
- Node.js is everywhere
- MongoDB root user
- Combine JavaScript and CSS
- Inline Small JavaScript and CSS
- Minify JavaScript and CSS
- Defer Parsing of JavaScript
- Prefer Async Script Loading
- Components, Bootstrap and DOM
- What is HEAD in git?
- Show the changes in Git.
- What is AngularJS 2?
- Confidence Interval for a Population Mean
- Accuracy vs. Precision
- Sampling Distribution
- Working with the Normal Distribution
- Standardized score - Z score
- Percentile
- Evaluating the Normal Distribution
- What is Nodejs? Advantages and disadvantage?
- How do I debug Nodejs applications?
- Sync directory search using fs.readdirSync