A Word From: Sofia Robb


Since 1999, Cold Spring Harbor Laboratory has offered an annual course in Programming for Biology, which runs for two weeks every October. The course underwent a major modification in 2017: Co-instructors Simon Prochnik and Sofia Robb changed the language being taught from Perl to Python. The switch came after 18 years because “more and more people are using Python for bioinformatics analysis,” as Sofia explains below. 

Both Simon and Sofia have been with the course since 2002. Simon first participated as a Teaching Assistant (TA) before taking on the role of lead instructor in 2004. Sofia, a 2002 course alumna, experienced the course from every role prior to joining Simon as a co-instructor in 2011. We met with Sofia last year to chat about the long-running course.

Can you talk us through what a typical day looks like for a course trainee?

We have three phases in the course, and Phase 1 is general programming. For the first week, we have a morning lecture each day that teaches programming followed by exercises on the computer. Then in the afternoon, we have another lecture and more time on the computer. We usually have a review session after dinner and then time again on the computer. These Phase 1 sessions cover the basics like how to do a for-loop and an if-statement, and we give them real-world problems for practice so they can understand why they’re doing what they’re doing. And that’s the way it goes for the first week. 

The second phase is similar but we bring in outside lecturers who talk about “bigger” topics. Like the first phase, these lectures are followed by programming exercises, but the exercises have the flair and flavor of the lecturer so the trainees can see how concepts can be applied to a different slice of bioinformatics. For example, we’ll look at sequence similarity, file formats and how to convert between them, or how to work with certain files from NGS (next-generation sequencing). The trainees are using a lot of the same concepts they learned from the first week, but practicing more to help cement what they’re learning. This course is like learning a foreign language: it’s awkward at first and just so new, but practice makes it a little more familiar. We’re not looking for perfection but general understanding. 

Phase 3 takes place during the last three or four days of the course and involves group projects. Simon and I have the trainees present ideas for individual projects and, oftentimes, the ideas are very similar so we rewrite them into five or six group projects. Each TA signs up for the project they feel they can help the most with, and the trainees break into groups with a TA and work on the project task from start to finish. The projects are too big for one person to complete in three days  so they have to work as a team, decide who’s doing what, and how they’re doing their individual parts. They also have to report to each other because, even though they work in parallel, the output from each part has to feed into the next steps. On the final day of the course, the groups present their projects and results to the class. It’s a good skill building exercise. A lot of times, the trainees can bring the projects home and use them on their data; they can send their data through pipelines built in the course. It’s amazing actually, what they can do after just two weeks.

You’ve experienced the course via all roles: student, TA, and instructor. What is your highlight from each role, and what brings you back year after year? 

Taking the course was amazing and life-changing for my career, in terms of learning how to program and using it in my work. I loved the course so much that, while I was a student, I asked Lincoln Stein, lead instructor in 2002 if I could return as a TA. He said, “Yes, email me in August and I'll give you the details.” So I did and I came back as a TA in 2003. It's been amazing to help the trainees learn since then. 

Most of the students come to the course with little programming background, without knowing that a terminal window exists on their computer, so we teach them from scratch. The course is challenging for trainees but it’s just as challenging for the instructors and TAs, who have to figure out what the trainees understand or don’t understand each year, and where they need help. But when they get it, they get it; you can see the light bulb and a sigh of relief. 

The course is challenging even from day 1. By the third day, the trainees feel like their heads will explode and they can’t take in any more new information. When that happens, we always make a point of reminding the students to take a step back and compare what they knew on the first day to what they know now. That usually helps.  

What was the reasoning behind changing the programming language taught in the course? 

It’s a big change. The course started in 1999 and taught Perl exclusively until 2017. But more and more people are using Python for bioinformatics analyses now, so we thought we’d try out teaching it in the course. So far, it’s going well.

Besides switching from Perl to Python, have you noticed any other changes in the course over the past fifteen years?

The biggest change I’ve seen is a shift away from people building tools. When the course started, there weren’t many tools available and so a lot of people were interested in building tools for the community. Now, there are so many tools out there that this isn’t as necessary, and I’ve seen this shift reflected in the course.

And of course our topics change. We evaluate what topics are in most need and, when reviewing applications, we get a feel for what methodologies the trainees are using. Oftentimes there’s a consensus on what the students are interested in, so we try to invite a speaker to cover that topic. 

What do you and Simon look for when reviewing applications? 

We like it when an applicant understands their problem and presents a solution, but knows they can’t obtain the solution unless they learn something more. It’s not enough for an applicant to only say, “Bioinformatics is important and I want to learn it.” 

We also look for enthusiasm. It’s nice to have students who are enthusiastic about what they’re working on and learning something new. Beyond that, we try to do some group building. We like when our students hail from diverse projects because it broadens the awareness of everyone else in the class. The students come in with limited bioinformatics experience, and they don’t know what tools are available or what problems there are except for their own. It’s beneficial for them to see all the different issues people have using different systems and methodologies. 

And there have been all kinds of people who take this course while working on projects that require very different perspectives and approaches. We had a trainee once who was studying biology and bioinformatics with a focus on human language. She was working with octopus---specifically octopus tentacles---because the muscles in tentacles are similar to muscles in the human tongue.

Despite these differences however, the trainees become a little family. One of the reasons the course is successful is because the trainees feel comfortable with each other, the instructors, and the TAs. We always try to incorporate activities to ensure the trainees aren’t just sitting next to each other and typing on computers. In addition to the group projects, we dine together, have group runs or walks on campus, attend the CSHL Halloween party together, and go out one evening in Huntington (the local town).  

These are chances for the trainees to meet, interact, and become a little community. And as a community, they are more comfortable asking questions in class. This course is a little stressful. Strangers are more stressful than friends, so they become friends pretty fast.

It’s really sad when the course ends and everybody leaves. Actually, it’s more emotional when you get home and don’t have twenty people to eat or take a coffee break with. A lot of the trainees find that a difficult transition because we do become like family here. 

The Programming for Biology course returns to the Laboratory this October and is accepting applications here until this Sunday, July 15th. To learn about the course from the perspective of a former student, read this Q&A with 2016 and 2017 course alumna Shasta Webb.

For more conversations with other course instructors, check out the rest of our A Word From series. 

Photo: Constance Brukin