Introduction

Example: Early smoking research "My uncle smokes three packs a day and he's in perfectly good health".

Limited sample size that might not be representative of the population. We call such evidence, anecdotal evidence. At the time, it was concluded that, "smoking is a complex human behavior, by its nature difficult to study, confounded by human variability". Nowaday, in time, researchers were able to examine larger samples of cases, in other words, more smokers. And with data collected from larger samples over time, trends showing the negative health impacts of smoking became much clearer.

The goal of this course is to teach you to make sense of data using statistical tools in order to be able to explore relationships between variables and make informed decisions.

First question you should ask when faced with a new study or data set: What is the research question, what is the population of interest, and what is the sample? For example:

Research QuestionPopulationSample Generalize to
Are consumers of certain alcohol brands more likely to end up in the emergency room with injuries? Everyone ER patients at the Johns Hopkins Hospital in Baltimore in the US Residents of Baltimore

What will come from this course?

duke, data science

  • Population: To define the populations of interest.
  • Sample: Methods of taking samples from this population.
  • Design: Design the study that can best answer particular research questions.
  • Scope: To identify the scope of the inference for our study, such as when we can make causal versus correlation statements, and when we can generalize out conclusions to the population at large.
  • Exploratory data analysis: Method to exploratory data analysis, such as visualisation and summary statistics
  • Inference: The statistical inference

References & Resources

  • N/A