Programming for Biology Course

Visitor of the Week: Kelsey Scott

VOTW (4).png

Meet Kelsey Scott of The Ohio State University! A member of Jason Slot’s lab within the Department of Plant Pathology, the PhD student is about halfway through her training at the annual course on Programming for Biology. This is her first course at CSHL and she “had no idea what to expect, besides prior attendees calling it “coding boot camp”.” The course, as usual, delivers and “has turned out to be a lovely opportunity to work closely and learn alongside a great group of people.” So much so that if Kelsey has the chance to return for another course, she “will definitely take it.” 

What are your research interests? What are you working on?
I’m interested in identifying the genomic attributes of fungi associated with particular lifestyles, focusing on secondary metabolite-related gene clusters. For example, what’s genetically different about soil-dwelling fungi compared to fungi that eat other fungi, or non-pathogenic fungi that live inside a plant?

How did you decide to make this the focus of your research?
This project is interesting to me because of the technology I use to generate my data, both the continually-developing sequencing technology as well as the software used to process the data. I love how there are constant technological improvements and we can achieve goals previously thought to be impossible. 

How did your scientific journey begin?
Ever since I was a kid I knew I wanted to become a scientist, but I always assumed that I would only ever work with plants. Dr. Cynthia Briggs, one of my favorite professors in undergrad, encouraged me to check out the field of plant pathology so I could work with both plants and my other love, fungi. Fast forward a few years and now I’m in a mycology lab in a plant pathology department, a perfect fit! 

Was there something specific about the Programming for Biology course that drew you to apply?
My research depends on using a long and involved pipeline to obtain genome annotations from raw genome sequence data; I wanted to learn how to automate this pipeline and other routine processes so I have more time to focus on my research questions. I was especially interested in the workshop seminars on genome sequencing wet lab methods and genome assembly software, so I could learn how to improve my own techniques.

What is your key takeaway from the Course?
My key takeaway from this course is that just about any data manipulation can be made faster and more efficient with a Python script.

What and/or how will you apply what you’ve picked up from the Course to your work?
I’ve tried to learn Python coding using online modules and classes, but nothing ever stuck until I attended this course. I’ve been at CSHL just over a week and I already feel so much more confident in my skills--I’m excited to write scripts that can process my large datasets automatically.

If someone curious in attending this course asked you for feedback or advice on it, what would you tell him/her?
I highly recommend this workshop for anyone that has little to no experience in Python coding and wants to apply these skills to biology-related research, especially if they need to handle large amounts of data. This workshop is a fast-paced and immersive experience, and the instructors and TAs work tirelessly to provide their students with a great learning opportunity.

What do you like most about your time at CSHL?
I love how close the beach is! Whenever I have the chance, I walk to the beach to take pictures and search for horseshoe crabs.

Kelsey received financial support from National Human Genome Research Institute (NHGRI) to cover a portion of her course tuition. On behalf of Kelsey, thank you to NHGRI for supporting and enabling our young scientists to participate in training courses where they expand their skills, knowledge, and network.

Thank you to Kelsey for being this week's featured visitor. To meet other featured scientists - and discover the wide range of science that takes part in a CSHL meeting or course - go here.

Visitor of the Week: Jannine Forst

2018-cshl-visitor-jannine-forst

Meet Jannine Forst of the University of California, Santa Cruz. The postdoctoral scholar is a member of the Human Paleogenoics Lab led by Prof. Lars Fehren-Schmitz. She is on campus strengthening her programming skill set via Programming for Biology – her first course at CSHL.

What are your research interests? What are you working on?
My research is in ancient DNA - using genetics to study archaeological material, from humans to animals, diseases to microbiomes. My current project is concerned with deciphering the human population history of Pre-Columbian South America, specifically at Machu Picchu, but my personal research interests lie in population genetics, human health, and disease.

How did you decide to make this the focus of your research?
I’ve always been interested in history, archaeology, and genetics - which made ancient DNA research a perfect match for me.

How did your scientific journey begin?
My mom, a geneticist, definitely inspired me. She encouraged me to explore all the topics I found interesting and supported me in my endeavors, no matter the direction they took me. From watching Bill Nye the Science Guy as a kid to summer camps at the museum, she helped cultivate my curiosity which eventually led me to ancient DNA research.

Was there something specific about the Programming for Biology course that drew you to apply?
I came to the Programming for Biology course looking to jump-start my programming skills. With the widespread use of high-throughput sequencing, researchers cannot manually look through the hundreds of thousands of sequences per sample generated. So programming in a common language like Python - which is what this course teaches - is becoming indispensable for many biologists, ancient DNA researchers included.

What and/or how will you apply what you've learned from the course to your work?
For my work, I will start by coding solutions to the many small ideas that pop up when analyzing ancient genetic data, which is very different from working with modern genetic data. I will also make all the e-books, problem sets, and resources given to me at Programming for Biology available to my lab mates.

What is your key takeaway from the course?
Learning how to program mainly takes practice! Practice, practice, and more practice.

If someone curious in attending this course asked you for feedback or advice on it, what would you tell him/her?
Be prepared for a lot of work but a lot of fun! You will learn a lot of things in a very short period of time. The instructors did a great job on the course material, structure, and timing.

What do you like most about your time at CSHL?
What I liked most about my time at CSHL is the community. The meetings and conferences eat together so you get to interact with various interesting people who you may not normally have a chance to meet.

Thank you to Jannine for being this week's featured visitor. To meet other featured scientists - and discover the wide range of science that takes part in a CSHL meeting or course - go here.

A Word From: Sofia Robb

c-info-17-awf-sofia-robb

Since 1999, Cold Spring Harbor Laboratory has offered an annual course in Programming for Biology, which runs for two weeks every October. The course underwent a major modification in 2017: Co-instructors Simon Prochnik and Sofia Robb changed the language being taught from Perl to Python. The switch came after 18 years because “more and more people are using Python for bioinformatics analysis,” as Sofia explains below. 

Both Simon and Sofia have been with the course since 2002. Simon first participated as a Teaching Assistant (TA) before taking on the role of lead instructor in 2004. Sofia, a 2002 course alumna, experienced the course from every role prior to joining Simon as a co-instructor in 2011. We met with Sofia last year to chat about the long-running course.

Can you talk us through what a typical day looks like for a course trainee?

We have three phases in the course, and Phase 1 is general programming. For the first week, we have a morning lecture each day that teaches programming followed by exercises on the computer. Then in the afternoon, we have another lecture and more time on the computer. We usually have a review session after dinner and then time again on the computer. These Phase 1 sessions cover the basics like how to do a for-loop and an if-statement, and we give them real-world problems for practice so they can understand why they’re doing what they’re doing. And that’s the way it goes for the first week. 

The second phase is similar but we bring in outside lecturers who talk about “bigger” topics. Like the first phase, these lectures are followed by programming exercises, but the exercises have the flair and flavor of the lecturer so the trainees can see how concepts can be applied to a different slice of bioinformatics. For example, we’ll look at sequence similarity, file formats and how to convert between them, or how to work with certain files from NGS (next-generation sequencing). The trainees are using a lot of the same concepts they learned from the first week, but practicing more to help cement what they’re learning. This course is like learning a foreign language: it’s awkward at first and just so new, but practice makes it a little more familiar. We’re not looking for perfection but general understanding. 

Phase 3 takes place during the last three or four days of the course and involves group projects. Simon and I have the trainees present ideas for individual projects and, oftentimes, the ideas are very similar so we rewrite them into five or six group projects. Each TA signs up for the project they feel they can help the most with, and the trainees break into groups with a TA and work on the project task from start to finish. The projects are too big for one person to complete in three days  so they have to work as a team, decide who’s doing what, and how they’re doing their individual parts. They also have to report to each other because, even though they work in parallel, the output from each part has to feed into the next steps. On the final day of the course, the groups present their projects and results to the class. It’s a good skill building exercise. A lot of times, the trainees can bring the projects home and use them on their data; they can send their data through pipelines built in the course. It’s amazing actually, what they can do after just two weeks.

You’ve experienced the course via all roles: student, TA, and instructor. What is your highlight from each role, and what brings you back year after year? 

Taking the course was amazing and life-changing for my career, in terms of learning how to program and using it in my work. I loved the course so much that, while I was a student, I asked Lincoln Stein, lead instructor in 2002 if I could return as a TA. He said, “Yes, email me in August and I'll give you the details.” So I did and I came back as a TA in 2003. It's been amazing to help the trainees learn since then. 

Most of the students come to the course with little programming background, without knowing that a terminal window exists on their computer, so we teach them from scratch. The course is challenging for trainees but it’s just as challenging for the instructors and TAs, who have to figure out what the trainees understand or don’t understand each year, and where they need help. But when they get it, they get it; you can see the light bulb and a sigh of relief. 

The course is challenging even from day 1. By the third day, the trainees feel like their heads will explode and they can’t take in any more new information. When that happens, we always make a point of reminding the students to take a step back and compare what they knew on the first day to what they know now. That usually helps.  

What was the reasoning behind changing the programming language taught in the course? 

It’s a big change. The course started in 1999 and taught Perl exclusively until 2017. But more and more people are using Python for bioinformatics analyses now, so we thought we’d try out teaching it in the course. So far, it’s going well.

Besides switching from Perl to Python, have you noticed any other changes in the course over the past fifteen years?

The biggest change I’ve seen is a shift away from people building tools. When the course started, there weren’t many tools available and so a lot of people were interested in building tools for the community. Now, there are so many tools out there that this isn’t as necessary, and I’ve seen this shift reflected in the course.

And of course our topics change. We evaluate what topics are in most need and, when reviewing applications, we get a feel for what methodologies the trainees are using. Oftentimes there’s a consensus on what the students are interested in, so we try to invite a speaker to cover that topic. 

What do you and Simon look for when reviewing applications? 

We like it when an applicant understands their problem and presents a solution, but knows they can’t obtain the solution unless they learn something more. It’s not enough for an applicant to only say, “Bioinformatics is important and I want to learn it.” 

We also look for enthusiasm. It’s nice to have students who are enthusiastic about what they’re working on and learning something new. Beyond that, we try to do some group building. We like when our students hail from diverse projects because it broadens the awareness of everyone else in the class. The students come in with limited bioinformatics experience, and they don’t know what tools are available or what problems there are except for their own. It’s beneficial for them to see all the different issues people have using different systems and methodologies. 

And there have been all kinds of people who take this course while working on projects that require very different perspectives and approaches. We had a trainee once who was studying biology and bioinformatics with a focus on human language. She was working with octopus---specifically octopus tentacles---because the muscles in tentacles are similar to muscles in the human tongue.

Despite these differences however, the trainees become a little family. One of the reasons the course is successful is because the trainees feel comfortable with each other, the instructors, and the TAs. We always try to incorporate activities to ensure the trainees aren’t just sitting next to each other and typing on computers. In addition to the group projects, we dine together, have group runs or walks on campus, attend the CSHL Halloween party together, and go out one evening in Huntington (the local town).  

These are chances for the trainees to meet, interact, and become a little community. And as a community, they are more comfortable asking questions in class. This course is a little stressful. Strangers are more stressful than friends, so they become friends pretty fast.

It’s really sad when the course ends and everybody leaves. Actually, it’s more emotional when you get home and don’t have twenty people to eat or take a coffee break with. A lot of the trainees find that a difficult transition because we do become like family here. 

The Programming for Biology course returns to the Laboratory this October and is accepting applications here until this Sunday, July 15th. To learn about the course from the perspective of a former student, read this Q&A with 2016 and 2017 course alumna Shasta Webb.

For more conversations with other course instructors, check out the rest of our A Word From series. 

Photo: Constance Brukin

Repeat Visitor: Shasta Webb

c-info-17-shasta-webb

Since 1999, Cold Spring Harbor Laboratory has run an annual course in Programming for Biology aimed at lab biologists with no prior coding experience. The course provides trainees with the bioinformatics and scripting skills necessary to design biological data analysis pipelines using computer code. In 2017, lead instructors Simon Prochnik and Sofia Robb changed the language being taught from Perl to Python, a popular scripting language with a growing community of users. 

Python’s increasing popularity is one of the reasons Shasta Webb, a PhD student at the University of Calgary, returned to CSHL last October and took part in her second consecutive Programming for Biology course. Shasta is now an alum of both the 2016 and 2017 courses, where she learned to program in Perl and Python, respectively. We sat down with her to chat about how she uses scripting languages in her work, why she first attended the Programming for Biology course, why she returned, and her thoughts on the changes Simon and Sofia made to the course. 

I am housed in an anthropology department but the work I do spans anthropology, biology, and metagenomics. My work focuses on how gut bacterial communities (the microbiome) change throughout pregnancy and lactation. I study this question using a wild system of white-faced capuchin monkeys in Costa Rica. One of my goals is to unite field methods that look at behavior, ecology, and diet in this primate species, and then combine those methods with metagenomics to see how, internally, the primates are responding to social cues.

A big part of my work involves raw sequencing data from the microbiome. We extract and sequence bacterial DNA, which results in huge data files that we need to parse, organize, and clean up for downstream analysis. As I was launching into this work, it became clear that I needed some way to do all of that by myself. That was the initial impetus for taking a programming course. Plus, personally, I prefer to work closely with my own data as opposed to outsourcing its analysis. I want to understand what's going on behind the scenes with my sequencing data.

In the 2016 course, you learned how to program in Perl. What was your motivation in coming back for the 2017 course?

Part of the reason I came back was that we started some new collaborations, and a lot of the people we now work with code in Python. As a result, it became increasingly clear that Python would be a really useful language to learn in order to maintain and grow the lab’s collaborations. I don’t have the authority to say whether Perl or Python is better, whether the core fundamentals of one language are better or worse than the other. It’s just out of convenience and the way my research trajectory has gone that I decided to pursue Python as well.

L to R: Shasta Webb, Meredith Cenzer, Drew Behrens, Jared Brewer, Adam Blanchard

L to R: Shasta Webb, Meredith Cenzer, Drew Behrens, Jared Brewer, Adam Blanchard

But there are other things that drew me back. The way that the course is designed is a really nice fit for my learning style. In the first week, it’s a mixture of lectures and putting into practice what we learned during the lectures. It’s so well-balanced: you never feel like the lecture chunks or the programming-practice chunks are too long. In the second week, we work on direct applications. I’ve attended smaller workshops for other languages and, oftentimes, you’re given coding practice but there’s a missing component in how to link it to your own research. In this course, we get to not only hear about other projects that leaders in bioinformatics have done, but we also get to do our own projects where we put into practice all the basic stuff we learned in the first week. Overall, the organization is just really good.

Also, the community facilitated by Simon and Sofia is really positive and fun. They are great about reminding us that no one is falling behind, that everyone is improving and doing well. I have never felt intimidated by Simon, Sofia, or the TAs, and I get the sense that they want us to have fun learning and coding. I had a great time at the 2016 course so when I saw they switched to Python, I jumped at the chance to come back.

Although prior coding experience isn’t a prerequisite for the course, did your participation in the 2016 course help you in the 2017 course? 

Initially, it was a little more hectic because it was all so new. It’s pretty humbling to come in and learn a totally new skill, to look at a blank screen and have to write a script. I remember that from 2016. At the beginning of every problem set, before I got warmed up, I felt really intimidated, stressed out, and concerned that I wasn't becoming a programmer fast enough. But once you have a foundation and some of the core components of programming, learning another language feels easier. In 2017, I sat down and was able to quickly produce a script for one of the first Python problem sets. It was this nice moment of confidence that set the tone for the rest of the course. I suddenly felt like I would be able to handle even challenging scripts, because if you can get something down in the text editor, then you have something to build off of.

There are still challenges when learning another scripting language. The problems are new, and the syntax between languages are different. Once we started getting into the more advanced stuff – like complicated data structures – it became difficult and challenging, but in a good way. To take the time to attend this course for two weeks, I don’t want it to be easy. Feeling challenged is good because you feel so accomplished when you actually get through a difficult script.

Besides switching to Python from Perl, was there anything else introduced in 2017 that you found helpful?

What I've found really useful is the course’s inclusion of GitHub, which is a place for anyone to submit and openly share code, text files, or anything to do with programming. It was really nice to have a formal introduction to GitHub because I’d been meaning to learn it for a long time. In 2017, the entire course was run through it – our access to all of the course documents, problem sets, and example code was through GitHub. It’s something I will use daily at my home institution because I want to be able to share my code easily with other people.

The Programming for Biology course returns to the Laboratory this October. Applications for this year's course are being accepted until July 15, 2018 here

Thank you to Shasta for sharing with us her experience, and we look forward to having her back for at the Laboratory again. To meet other featured scientists - and discover the wide range of science that takes part in a CSHL meeting or course - go here.

Visitor of the Week: Adam Blanchard

VOTW-insta (2).png

Meet Adam Blanchard of the University of Nottingham (United Kingdom). Adam is a postdoctoral research fellow in the School of Veterinary Medicine and Science - Cellular Microbiology with Sabine Totemeyer as his mentor. He is on campus for the Programming for Biology course. 

What are your research interests? What are you working on?
Most of my research interests are based around host/pathogen interactions, focusing on bacteria with an impact on animal welfare. I enjoy implementing novel methods, such as metatranscriptomics, to investigate the host response to complex bacterial communities and which genetic factors play a role in disease.

Was there something specific about the Programming for Biology course that drew you to apply?
Knowing a scripting language is becoming increasingly important in biology, so attending a course like this will enable me to gain a head start that will undoubtedly benefit my career. The delivery style of the Programming for Biology course material seemed very concise. I have taken similar online courses which I felt lacked the personal interaction needed when troubleshooting and gaining a deeper understanding of the topics.

What is your key takeaway from the Course?
The Programming for Biology course is very well organized. There is a good balance of "need to know" information and other bits which have allowed for an effective learning experience; as you do not get bogged down with unnecessary detail.

How many CSHL courses have you attended? Do you have any near future plans to attend another course at CSHL?
This is my first one and, if I get the opportunity, I would love to participate in the Immersive Approaches to Biological Data Visualization course as it looks very interesting and is showcasing novel ideas. I really enjoy science outreach and an important aspect of this is data visualization. I feel being able to convey complex results in an easy-to-understand image is a powerful tool in helping people understand what you do and are trying to achieve.

If someone curious in attending a future iteration of Programming for Biology course asked you for feedback or advice on it, what would you tell him/her?
The environment at Cold Spring Harbor Laboratory is very positive. The instructors and teaching assistants are there to help at every step which is helpful because the course is tough. Also since all of the course participants begin the course at a similar level of understanding, there are plenty of opportunity for team work.

What do you like most about your time at CSHL?
The course instructors, teaching assistants and members of staff have been fantastic and have provided a really enjoyable two weeks. They balanced the course schedule well with lectures, practical sessions, sports and field trips. They also make sure you do not forget to go for meals when you are engrossed in the problem sets.

Adam received a scholarship from the Helmsley Charitable Trust to cover a portion of his course tuition. On behalf of Adam, thank you to the Helmsley Charitable Trust for supporting and enabling our young scientists to attend a CSHL course where they expand their skills, knowledge, and network.

Thank you to Yvanka for being this week's featured visitor. To meet other featured scientists - and discover the wide range of science that takes part in a CSHL meeting or course – go here