TLDR; A write up of an assignment from Coursera’s Data Analysis course aiming to identify patterns of activity from accelerometer data using SVD and Randrom Forests in R.
I’ve been following Coursera’s “Data Analysis” course, taught by Jeff Leek (straight outta Hopkins!). It’s been interesting, having spent 7 years doing particle physics, most of the techniques are not new, but the jargon is. It has also highlighted some important differences in the methodology of Particle Physics and Bio-statistics, driven by our reliance on synthetic data (or conversely, driven by their lack of reliable Monte Carlo). Since the second assignment is over and done with, I thought I’d post a little write-up here.
The aim was to study the predictive power of data collected by the accelerometer and gyroscope of Samsung Galaxy S II smartphones, carried by a group of individuals performing various activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying down). The dataset consists of 7352 samples (pre-processed by applying noise filters and by sampling the values in fixed time windows) of 562 features collected from 21 individuals. Of these 21 individuals, I randomly set aside three as a testing sample, and another four as a validation sample.
TL;DR: I talk about some text frequency analysis I did on the arxiv.org corpus using python, mysql, and R to identify trends and spot interesting new physics results.
In one of my previous posts, I mentioned some optimization I had done on a word-frequency anlysis tool. I thought I’d say a bit more here about the tool (buzzArxiv), which I’ve put together using Python and R to find articles that are creating a lot of ‘buzz’.
For those who don’t know, arxiv.org is an online repository of scientific papers, categorized by field (experimental particle physics, astrophysics, condensed matter, etc…). Most of the pre-print articles posted to the arxiv eventually also get submitted to journals, but it’s usually on the arxiv that the word about new work gets disseminated in the community. The thing is, there’s just gobs of new material on there every day. The experimental, pheonomonolgy, and theory particle physics mailings can each have a dozen or so articles per day.
TL;DR: The academic job market is looking bleaker than ever, and name brand recognition counts for a lot.
- Step 1: College Student, study hard.
- Step 2: Graduate Student, study hard, do research.
- Step 3: Postdoc, work hard, more reasearch, prove your chops.
- Step 4: Proffesor, kick back and relax, you’re set for life.
Ah, if only.
The postings have begun for this year’s set of junior and tenure track faculty positions. This affects me directly, since I’m somewhere around step 3.5. The truth of the matter, is that the job market for faculty positions is amazingly competitive, with few offerings and a sea of prospective applicants. A while ago, stuck at home while both Laureline and Jenn were sick as dogs, I had a little fun with the information up on the HEP Rumor Mill. The HEP Rumor Mill is exactly what it sounds like, completely unverified rumors about who has been short listed or made an offer high energy particle physics faculty jobs. They’ve got one page per year, going back to the 2004-2005 job cycle. All it requires to get the data is writing quick and dirty unstructured data parser. Ithought I’d share some of what I found. Note that all of what comes out of this is highly suspect. The short-lists and offers are un-verified, the names of candidates are sometimes misspelled, and I even caught one of two instances of someone’s affiliation being improperly reported.