ECEP VIRTUAL CAMPUS

# HarvardX: PH525.1x Data Analysis for Life Sciences 1: Statistics and R

An introduction to basic statistical concepts and R programming skills necessary for analyzing data in the life sciences. ## Course Description

An introduction to basic statistical concepts and R programming skills necessary for analyzing data in the life sciences. We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals. We will provide examples by programming in R in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. We will also introduce the basics of using R scripts to conduct reproducible research.

Topics:

• Distributions
• Inference
• Exploratory Data Analysis
• Non-parametric statistics

## Course Syllabus

Course content will be discussed on a weekly basis with the following schedule:

Week 1: Getting Started

• Using Rstudio
• R programming skills
• Getting organized

Week 2: Random Variables, Probability Distributions, and the Central Limit Theorem

• Introduction to random variables
• Introduction to the null distribution
• Probability distributions
• The normal distribution

Week 3: Inference

• t-tests
• The Central Limit Theorem
• Association tests
• Monte Carlo methods
• Permutation tests
• Power

Week 4: Exploratory Data Analysis and Robust Summaries

• Exploratory data analysis
• histogram
• QQ-plot
• boxplot
• scatterplot
• log transformation
• Robust summaries
• Median, MAD and Spearman correlation
• Mann-Whitney-Wilcoxon test

## Suggested pre-requisites

• Basic programming skills. We will assume that learners are familiar with very basic programming concepts (variables, functions).
• Familiarity with the R language. The course will use R in order to demonstrate data analyses. In the first week, we will have a refresher on the commands in R which you will need to use in the following weeks, but this is not a comprehensive R course, and we will not go in depth on R syntax. Please see below for online R resources.
Starts August 18, 2015

### Course at a Glance

10 weeks of study 2-4 hours/week English English & Vietnamese subtitles