TLDR; A write up of an assignment from Coursera’s Data Analysis course aiming to identify patterns of activity from accelerometer data using SVD and Randrom Forests in R.
I’ve been following Coursera’s “Data Analysis” course, taught by Jeff Leek (straight outta Hopkins!). It’s been interesting, having spent 7 years doing particle physics, most of the techniques are not new, but the jargon is. It has also highlighted some important differences in the methodology of Particle Physics and Bio-statistics, driven by our reliance on synthetic data (or conversely, driven by their lack of reliable Monte Carlo). Since the second assignment is over and done with, I thought I’d post a little write-up here.
The aim was to study the predictive power of data collected by the accelerometer and gyroscope of Samsung Galaxy S II smartphones, carried by a group of individuals performing various activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying down). The dataset consists of 7352 samples (pre-processed by applying noise filters and by sampling the values in fixed time windows) of 562 features collected from 21 individuals. Of these 21 individuals, I randomly set aside three as a testing sample, and another four as a validation sample.