The service will be shut down on April 1st. Please see this page for more information.

Becoming a programmer



Many of us have asked: "So I've decided I need to learn a programming language. Which language should I choose?" We often get the rather true but somewhat frustrating response of "any, just pick." While there's no right answer, here's a couple guidelines for choosing a language and getting started.

  1. Summary
    1. Three Guidelines
    2. Try R
  2. Good Programming Practice
    1. Version management
    2. Write like a programmer
    3. Think like a programmer
  3. References

Three Guidelines

  1. What is most commonly used in your lab, your department or your field? Even the most fantastic language will be frustrating if others can't run your programs or there's no one to ask for help.

  2. Start with a scripting language like [WWW]R, [WWW]python, [WWW]perl or [WWW]matlab. These languages don't require a compiler, so they are much easier to learn, and faster to write, explore, and visualize. Compiled languages like Fortran, C, C++, and Java require writing down all your commands in a code, and then having another piece of software called the compiler which reads the entire script and translates it to machine language. The compiler has the intelligence of hordes of computer scientists accumulated over decades, all looking at your code trying to make it fast. The resulting program can run much faster, but will require more work to learn and write.

  3. Prefer general and extensible languages Open-source languages like R, python, and perl are freely available and develop faster than proprietary software such as Matlab because scientists can contribute their packages directly. Many languages have wrappers that let them read code written in a different language. This way you can utilize someone's fast C or Fortran function inside your own language. R is particularly good at this.

Try R

All that said, R makes a pretty good choice. It has a higher learning curve and less polish than other choices, but you get access to more power and cutting edge stuff like [WWW]easy parallel computing, working with [WWW]GIS, or using [WWW]regular expressions. If you've heard about some cool direction computing is going (say, computing on graphics processors), chances are that [WWW]someone is making it work in R.

Good Programming Practice

There are a couple skills that will help you learn and use any programming language, but are rarely taught in biology software tutorials. These tools and skills will save you time and tears in the future.

Version management

Ever try to recreate a graph or statistical analysis you performed last week but can't remember exactly what you did? Ever have a folder with tons of files in it labeled myfile_version1.txt myfile_version2.txt? Ever modify a file and then wish you could revert to the version you had a couple hours ago? How about working on the same file with multiple collaborators all making changes? What you need is version management. It just might change your life.

[WWW]Subversion is a great place to start.

Write like a programmer

[WWW]More information on style.

Think like a programmer


Phylogenetics on Linux

[WWW]R Textbook (free) While many R texts aim to teach biologists how to use certain packages, few teach you the basic logic of R. Understand that and it becomes much easier to learn and use any package. —Carl


Note: You must be logged in to add comments

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.