The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let’s work through an example to show how this statistics is computed.
Let’s assume that we want to look at the relationship between two variables, height and self esteem. Perhaps we have a hypothesis that how tall you affect your self esteem. Let’s say we collect some data from 20 individuals. Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items, where higher scores mean higher self esteem. Below is the data:
Now, let’s take a quick look at the histogram for each variable:
And, here are the descriptive statistics:
Calculating the Correlation
Now we are ready to compute the correlation value. The formula for the correlation is:
Where N is the number of the pair, ∑xy is the sum of the products of the pair, ∑x is the sum of x, ∑y is the sum of y, ∑x2 is the sum of the squared x, ∑y2 is the sum of the squared y.
For above case, N=20, ∑xy=4937.6, ∑x=1308, ∑y=75.1, ∑x2 =85912, ∑y2=285.45. Now, when we plug these values into the formula given above, we get the following:
The correlation for this case is 0.73, which is a fairly strong positive relationship. So, we guess there is a relationship between height and self esteem, at least in this made up data!
References & Resources
- Dependency injection
- Directives and Pipes
- Data binding
- HTTP Get vs. Post
- Node.js is everywhere
- MongoDB root user
- Prefer Async Script Loading
- Components, Bootstrap and DOM
- What is HEAD in git?
- Show the changes in Git.
- What is AngularJS 2?
- Confidence Interval for a Population Mean
- Accuracy vs. Precision
- Sampling Distribution
- Working with the Normal Distribution
- Standardized score - Z score
- Evaluating the Normal Distribution
- What is Nodejs? Advantages and disadvantage?
- How do I debug Nodejs applications?
- Sync directory search using fs.readdirSync