Advice for Prospective PhD Students (in biostatistics)

It’s application season for graduate admissions again! As a current PhD student, I thought I would share some advice for prospective students. I’d previously written on whether you should get a PhD. In this post, I will talk about things you can do to prepare for a biostatistics PhD and the application process. As with any advice I give on this blog, it is based on my personal experience – I was a math and statistics major in undergrad at UChicago and I’m now a biostatistics PhD student at Johns Hopkins – so your mileage may vary.

1. Talk to your professors

The biggest thing I didn’t do as an undergraduate that I wish I did was to talk to my statistics professors. This is advice I myself heard when I was in undergrad and didn’t follow, so let me try to frame it in a way that might be more convincing to my younger self.

The main reason you want to talk to your professors is so that they can write you good recommendation letters. But just because that’s the goal doesn’t mean your interactions have to feel forced or awkward. Any time you have a question about class material or homework sets is the perfect opportunity to go to your professor’s office hours. I remember when I was in undergrad, I felt like it was better to try to resolve these questions on my own or ask the TA because the professor seemed intimidating. But if my college professors are anything like the professors in my current department, they will be friendly and more than happy to talk to you about statistics. In fact, you should think of it this way – your questions are also of benefit to the professor because it’s useful feedback for how well they’re teaching each topic in the class.

You don’t have to expect to get along fantastically with every professor you talk to, but if you talk to each professor you take a class with at least once, there will eventually be some professors you are more comfortable talking to. Then you can also ask for their insight regarding going to graduate school (e.g. whether they think it’s a good idea, where to apply to, etc.), which will help you make more informed decisions come application time.

2. Take classes in math and statistics

The most essential classes for you to take as preparation for the biostatistics PhD curriculum1 are (a) some statistics classes, so that you are familiar with the jargon used in the field and (b) real analysis and linear algebra, which are relevant math prerequisites for your PhD classes. Our program allows considerable flexibility for (b), in that students will be taught linear algebra again as part of the curriculum anyway and may opt to retake real analysis during the first year of their PhD. But when you are applying, you want to demonstrate to the admissions committee that you have the ability to pass the PhD curriculum and progress in the program, which mainly means taking these classes in math and statistics as an undergraduate.

Despite the “bio” in biostatistics, classes in biology are not really necessary as preparation. It’s not part of the PhD curriculum at all. This is because there is a wide range of public health problems you may end up working on, so you will only be expected to develop an understanding of the relevant biological knowledge later on in your PhD when you start doing research.

3. Learn how to code

I’ve framed the previous point in terms of what you can do to succeed in your PhD classes, but what’s arguably more important is to succeed in the research for your dissertation. For most people, I believe that the most important thing they can do to prepare for research when they’re still in college is to learn how to code. Coding is what I (and, I suspect, most of my peers) spend the majority of my research time on, so the better you are at it, the easier your life will be. While you’ll be required to take a few programming classes as a PhD student, a lot about coding is just developing familiarity over time, so it’s to your advantage to start early. You can do this by taking some computer science classes in college or online. In particular, machine learning classes are good to take, because they often overlap in material with statistics classes, so you get the double benefit of learning things from a different perspective and building up some programming skills.

The most “useful” language to learn is R, since that’s almost certainly the language you will use as a biostatistics student. Now because R is sort of a niche language used by statisticians, it’s unlikely to be the language your computer science department will teach you. It’s likely going to be the language your statistics classes will use, but in my experience, teaching you R won’t be their focus, so you’ll probably have to learn it on your own. And that’s fine! I personally think it’s hard to learn a programming language from a class anyway and much of the learning really comes from you teaching yourself and constantly Googling things. The class is just there to force you to learn.

I still recommend that you take at least one computer science class because it’ll help you better understand the general, abstract ways in which programming languages work (e.g. control flow, data structures, etc.). And part of this understanding will also come from learning another programming language that’s not R. I recommend Python, since it’s commonly used by other people who do statistics and machine learning.2 But any general-purpose programming language, i.e. whatever they teach in your introductory computer science class will be helpful. With some knowledge of these fundamental concepts, it’ll help you write code that’s cleaner and more efficient.

Also, learn how to use git (version control)! I’m still surprised by how many people in academia don’t use it as part of their daily coding practices. I assure you that it’s definitely worth your time to learn and it’ll save you from a lot of trouble.

4. Try doing research in college

This is a more “optional” piece of advice than the first 3 points. Research opportunities like REUs are intended as a way for undergraduates to see what research is like and whether it seems interesting to them. I think they are useful for that purpose. Having said that, there’s still a big difference between what the experience of an REU participant is like versus that of a PhD student, not necessarily because the scientific problems are more complex in graduate school, but because graduate school life is far more chaotic – you are constantly juggling multiple projects, as well as TA assignments, classes, and miscellaneous administrative tasks (e.g. see my post on a day in the life of a PhD student). I would say that it’s often external factors in the chaos that make the program difficult for PhD students,3 rather than our intrinsic interest in scientific research, so REUs are limited in how much of that perspective they’ll provide.

I also think that it’s totally OK if you don’t do any research in undergrad. When I look back on what I’ve done so far, it’s easy to connect the dots and trace a path that “makes sense” for where I am now – e.g. I did research at X as an undergrad, which piqued my interest in Y, which led to my application to Z program. But that’s rarely how life works. At any point along the way, there were other things I considered, looked into, and tried, with varying levels of commitment. Part of finding the best path for you is to try different things and see what you like. If that means doing internships instead of REUs, I don’t want to discourage you from doing that. It may be the case that in order for you to determine that going to graduate school is a good choice, you have to try other options first.

5. Sort out your logistics

It goes without saying that at the end of the day, the most important thing for admissions is that you are on top of the logistics – taking the required standardized tests, getting your application materials ready, and submitting them before the deadline. From what I remember about my applications 4 years ago, two things stand out to me. The first is that the GRE subject test in math, which a few statistics PhD programs require, is only offered a few times a year, so you should look up those dates and plan for it way ahead of time if you want to take it. The second is that the SOPHAS application, which many biostatistics PhD programs will use, is very tedious and time-consuming to fill out.4

To read more posts where I talk about what getting a PhD is like, click here.

Update: I also recommend this excellent blog post by Simon Couch, which offers a more recent look into the application process and some common FAQ I have not covered.

  1. Specifically at Johns Hopkins, though I would guess it’s similar at other programs.↩︎

  2. As well as by computer scientists more generally.↩︎

  3. For more on this, I recommend Real Life by Brandon Taylor, which I found to be quite honestly representative of issues and conflicts faced by PhD students, especially minorities, even if it is dramatized as a fictional story.↩︎

  4. At least this was the case for me. They may have changed it!↩︎