Since 1999, Cold Spring Harbor Laboratory has run an annual course in Programming for Biology aimed at lab biologists with no prior coding experience. The course provides trainees with the bioinformatics and scripting skills necessary to design biological data analysis pipelines using computer code. In 2017, lead instructors Simon Prochnik and Sofia Robb changed the language being taught from Perl to Python, a popular scripting language with a growing community of users.
Python’s increasing popularity is one of the reasons Shasta Webb, a PhD student at the University of Calgary, returned to CSHL last October and took part in her second consecutive Programming for Biology course. Shasta is now an alum of both the 2016 and 2017 courses, where she learned to program in Perl and Python, respectively. We sat down with her to chat about , how she uses scripting languages in her work, why she first attended the Programming for Biology course, why she returned, and her thoughts on the changes Simon and Sofia made to the course.
I am housed in an anthropology department but the work I do spans anthropology, biology, and metagenomics. My work focuses on how gut bacterial communities (the microbiome) change throughout pregnancy and lactation. I study this question using a wild system of white-faced capuchin monkeys in Costa Rica. One of my goals is to unite field methods that look at behavior, ecology, and diet in this primate species, and then combine those methods with metagenomics to see how, internally, the primates are responding to social cues.
A big part of my work involves raw sequencing data from the microbiome. We extract and sequence bacterial DNA, which results in huge data files that we need to parse, organize, and clean up for downstream analysis. As I was launching into this work, it became clear that I needed some way to do all of that by myself. That was the initial impetus for taking a programming course. Plus, personally, I prefer to work closely with my own data as opposed to outsourcing its analysis. I want to understand what's going on behind the scenes with my sequencing data.
In the 2016 course, you learned how to program in Perl. What was your motivation in coming back for the 2017 course?
Part of the reason I came back was that we started some new collaborations, and a lot of the people we now work with code in Python. As a result, it became increasingly clear that Python would be a really useful language to learn in order to maintain and grow the lab’s collaborations. I don’t have the authority to say whether Perl or Python is better, whether the core fundamentals of one language are better or worse than the other. It’s just out of convenience and the way my research trajectory has gone that I decided to pursue Python as well.
But there are other things that drew me back. The way that the course is designed is a really nice fit for my learning style. In the first week, it’s a mixture of lectures and putting into practice what we learned during the lectures. It’s so well-balanced: you never feel like the lecture chunks or the programming-practice chunks are too long. In the second week, we work on direct applications. I’ve attended smaller workshops for other languages and, oftentimes, you’re given coding practice but there’s a missing component in how to link it to your own research. In this course, we get to not only hear about other projects that leaders in bioinformatics have done, but we also get to do our own projects where we put into practice all the basic stuff we learned in the first week. Overall, the organization is just really good.
Also, the community facilitated by Simon and Sofia is really positive and fun. They are great about reminding us that no one is falling behind, that everyone is improving and doing well. I have never felt intimidated by Simon, Sofia, or the TAs, and I get the sense that they want us to have fun learning and coding. I had a great time at the 2016 course so when I saw they switched to Python, I jumped at the chance to come back.
Although prior coding experience isn’t a prerequisite for the course, did your participation in the 2016 course help you in the 2017 course?
Initially, it was a little more hectic because it was all so new. It’s pretty humbling to come in and learn a totally new skill, to look at a blank screen and have to write a script. I remember that from 2016. At the beginning of every problem set, before I got warmed up, I felt really intimidated, stressed out, and concerned that I wasn't becoming a programmer fast enough. But once you have a foundation and some of the core components of programming, learning another language feels easier. In 2017, I sat down and was able to quickly produce a script for one of the first Python problem sets. It was this nice moment of confidence that set the tone for the rest of the course. I suddenly felt like I would be able to handle even challenging scripts, because if you can get something down in the text editor, then you have something to build off of.
There are still challenges when learning another scripting language. The problems are new, and the syntax between languages are different. Once we started getting into the more advanced stuff – like complicated data structures – it became difficult and challenging, but in a good way. To take the time to attend this course for two weeks, I don’t want it to be easy. Feeling challenged is good because you feel so accomplished when you actually get through a difficult script.
Besides switching to Python from Perl, was there anything else introduced in 2017 that you found helpful?
What I've found really useful is the course’s inclusion of GitHub, which is a place for anyone to submit and openly share code, text files, or anything to do with programming. It was really nice to have a formal introduction to GitHub because I’d been meaning to learn it for a long time. In 2017, the entire course was run through it – our access to all of the course documents, problem sets, and example code was through GitHub. It’s something I will use daily at my home institution because I want to be able to share my code easily with other people.
The Programming for Biology course returns to the Laboratory this October. Applications for this year's course are being accepted until July 15, 2019 here.
Thank you to Shasta for sharing with us her experience, and we look forward to having her back for at the Laboratory again. To meet other featured scientists - and discover the wide range of science that takes part in a CSHL meeting or course - go here.