My Preliminary Oral Exam

In my PhD program, we are required to take a “preliminary oral exam.” It’s the second major hurdle to getting your PhD, the first being the written comprehensive exam and the last being the thesis defense, and it typically takes place during your third year.

The structure of the oral exam is the following. First, I was asked to leave the room for a few minutes as the committee members decide what order they’re going to ask questions and other logistics.¹ Then I returned to the room to present on my ongoing/future research work for 15 minutes. In my presentation, I briefly mentioned the 4/5 projects I was working on, before focusing on one of them for the rest of the presentation. Then the committee members took turns asking questions for about 60 to 90 minutes. What these questions may be are defined quite broadly and therefore may vary depending on who’s on the student’s committee. The guidelines stated by the school are:

The purpose of this [preliminary] examination is to determine whether the student has both the ability and knowledge to undertake significant research in his/her general area of interest. Specifically, the examiners will be concerned with the student’s: (1) capacity for logical thinking; (2) breadth of knowledge in relevant areas; and (3) ability to develop and conduct research leading to a completed thesis.

In the interest of helping future students get a sense of what kinds of questions might be asked in their oral exam, I will try to recall the questions that I was asked by the committee² and add any comments I have in the footnotes.

Committee Member 1:

What are some advantages to using an unsupervised method versus a supervised method?
In your presentation, you mention the Bonferroni correction. Can you explain what that is?³
(follow-up) You mention that the Bonferroni correction is conservative. What would be a less conservative method?⁴
(follow-up) Do you think there would be benefits to controlling for the FDR in your method instead of the FWER?
How would you assess whether your model is correct or not?⁵
Can you explain why you would want to do cross-validation?
(follow-up) In the case of overfitting with a logistic regression, what would happen to the coefficients?
(follow-up) What is the bias-variance trade-off?⁶
What do you see as some future statistical advances that would be useful to you in your line of research?
(follow-up) What are some examples of feature selection methods that you think would be useful?

Committee Member 2:

Why do you use counts or proportions or rates in your analysis?
From an epidemiological point of view, what is the potential you see in your work in terms of prevention?
Can you assess the contribution of different causes in a tumor with your method?⁷
What are some challenges when looking at the effects of radiation?⁸

Committee Member 3:

What data did you use to build the prediction model for pancreatitis?
(follow-up) If you’re looking to use genetic mutations in predicting the risk of pancreatitis, what would be the difference there versus the project you talked about today?
What would happen if some of the labels for your data were incorrect?
If you’re not seeing an increase in the total number of mutations, what is a potential reason for that?

Committee Member 4:

So in your presentation, you talk about both dose and exposure. What is the difference? Can you give an example where they’re the same?⁹
Is there a biological basis behind some of the future work you discussed?
In some previous work we did, we found that the mutations differ between male and female rats. How would you interpret that?
(follow-up) So in breast cancer, would you expect to see differences in mutations between males and females?
(follow-up) Are there differences between how male and female breast cancer is treated?¹⁰
Do you think the samples are representative of the population? For example, if Johns Hopkins collected a bunch of tumor samples, is that data representative of the Maryland population?
Would it be important to look into the number of mutations in organ transplants?
How many samples do you think are needed to run your analysis?

Commitee Member 5 (advisor):

In your plots, what is the reason for why some mutations increase and some mutations decrease?
Can you explain what non-negative matrix factorization is?¹¹

Overall, I was pretty happy with how it went (I passed!). I felt like all the questions I received were reasonable. Nearly all of them were based on something I said in my slides, which I believe wasn’t always the case with other students’ oral exams. I felt like I probably struggled the most with explaining fundamental statistical concepts. Luckily, most of these were asked by Committee Member 1, so I got them over with at the beginning. Usually, it wasn’t because I didn’t know what the answer was, but because it was hard to explain something clearly on the spot.

If I had any advice, I would largely echo what other students told me before I took mine. First, scheduling the oral exam is already half the work, so start that process early (it took me about a month to schedule a time that worked for all my committee members). It is probably worth reviewing statistical methods briefly (I spent a day browsing my notes). It is also probably important that you have some minimal knowledge in your field of application (I wrote a short guide on cancer facts for myself and read some overview papers like this one). Lastly, it’s probably helpful to have a couple of ideas about what you think you can do research-wise in the coming years, each of which you can explain and defend regarding why you think they are interesting and feasible ideas (I discussed these things with my advisor). I think that those things should provide a good foundation for you to reason through the questions and give sufficiently intelligible answers.

I made the mistake of not bringing my phone with me, so I could only pass time by nervously pacing outside the room.↩︎
I’m not sure if my committee members would want their names associated publicly with the questions they asked, so I’m going to keep them anonymous.↩︎
I struggled with explaining the idea of multiple testing clearly, which was a little bit unfortunate because it was a topic in the class I was a TA for last term.↩︎
I couldn’t remember the details of any other method, so I described the alternative of controlling for the false discovery rate instead.↩︎
This was probably the question I struggled with the most. In the end, it seemed like they were asking about model checking for logistic regression, which I stumbled over by saying, well, I guess you can look at the residuals and outliers..?↩︎
Again, I wasn’t able to give a very clear answer on the spot, but the committee member deemed it minimally adequate.↩︎
Here, I basically said no, but then my advisor interrupted and said, actually, we have a section discussing this – don’t you remember, Albert? So that was a bit embarrassing.↩︎
This was about one of my side projects and I actually can’t remember what the real question was. I just remember being enthusiastic about my answer because I’d spent a long time last year looking into this and I never get to talk about it! So I had a lot to say.↩︎
I had no idea there were technical meanings for dose and exposure, but it seemed more like they wanted to teach me about this rather than expect me to know it.↩︎
I didn’t know the answer to this question, but it turned out that the committee member didn’t know either and was just curious. In the end, another committee member answered the question.↩︎
This was a gift of a question, because I’d been confused by this before and my advisor went over it with me, so he knew that I knew the answer.↩︎