10 Tips and Tricks for Statistical Proofs

I’ve been taking probability theory this year and I noticed that a lot of proofs will assume that the reader already knows some commonly used “tricks.” If you aren’t familiar with them, it can be hard to follow the proofs in the textbook,1 let alone prove it yourself. I felt like this was happening to me a lot, so in an effort to better familiarize myself, I’ve written down some useful tips and tricks, along with explanations and/or examples.

Should You Get a PhD (in biostatistics)?

Throughout the process of applying to graduate school, I felt unsure about whether getting a PhD was a good idea. I remember Googling “should I get a PhD” just to see what I could find out of curiosity. Chances are, if you’re reading this post, you’re in a similar boat. As a now second-year PhD student in biostatistics, I think I have a better idea of whether being in a PhD program has been a good choice for me and I’ll share what I think would’ve been useful for me to know back when I was applying.

My Favorite Books to Date

I sometimes get requests for book recommendations, so I decided to write a post on my favorite books.1 For each book, I give a very short description (< 3 sentences) of the book and why I enjoyed it. Hopefully, that’ll give you some idea of whether you want to read it too. Sapiens by Yuval Noah Harari - Harari tells a captivating story about how humans became the dominant species on the planet, with sweeping and intuitive explanations on a wide range of topics.

Point Shape Options in ggplot

I’m familiar enough with ggplot that I can make a quick plot pretty easily in most cases.1 But when it comes to fine-tuning the various plot aesthetics, like adjusting the legend position or rotating axis tick labels, I always have to look them up. Today, I will be writing about one of these pesky things: looking up the point shape options for geom_point. The available documentation for this isn’t great, so I thought it would be worthwhile to write my own reference.

Submitting Parallel Jobs on a Cluster

Recently, I’ve been running simulations on our school’s computing cluster (JHPCE), which schedules jobs using the Open Grid Engine. Each simulation takes about half a day to complete, so I could run them sequentially and wait a week to get 14 simulation points. Or I could run them in parallel and get 14 simulation points in less than a day! In theory, running my simulations in parallel should be a very straightforward task.