Course: Exploratory Data Analysis

  • Course Home
  • Syllabus
  • Lecture Materials
  • Projects
  • Want to stay in touch?

    2 + 3 =  


Except where otherwise indicated, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site.

To sign up to take the course online, please visit the Johns Hopkins Data Science Specialization.

Course Description

In this course you will learn the ideas of reproducible research and reporting of statistical analyses. Topics covered include literate programming tools, evidence-based data analysis, and organizing data analyses. In this course you will learn to write a document using R markdown, integrate live R code into a literate statistical program, compile R markdown documents using knitr and related tools, publish reproducible documents to the web, and organize a data analysis so that it is reproducible and accessible to others.


  • Structuring and organizing a data analysis
  • Markdown and R Markdown
  • knitr / RPubs
  • Reproducible research check list
  • Evidence-based data analysis
  • Case studies in air pollution epidemiology and high-throughput biology


There will be Quizzes in Weeks 1 and 2. The quizzes will both open on the first day of the course, but they have different due dates. Week 1 Quiz will be due at the end of the first week, and the Week 2 Quiz will be due at the end of the second week.

Points and Scoring

  • Quiz 1: 20%
  • Quiz 2: 20%
  • Peer Assessment 1: 25%
  • Peer Assessment 2: 35%

Course Project

The plotting assignments will be assessed via peer assessment. In these assignments you will be asked to construct or reproduce certain plots. You will be evaluated by your classmates on the plot that you produce and the code that you write to construct the plot. Assignments evaluted via peer assessment will make use of your GitHub account.