Many of us have asked: "So I've decided I need to learn a programming language. Which language should I choose?" We often get the rather true but somewhat frustrating response of "any, just pick." While there's no right answer, here's a couple guidelines for choosing a language and getting started.
What is most commonly used in your lab, your department or your field? Even the most fantastic language will be frustrating if others can't run your programs or there's no one to ask for help.
Start with a scripting language like R, python, perl or matlab. These languages don't require a compiler, so they are much easier to learn, and faster to write, explore, and visualize. Compiled languages like Fortran, C, C++, and Java require writing down all your commands in a code, and then having another piece of software called the compiler which reads the entire script and translates it to machine language. The compiler has the intelligence of hordes of computer scientists accumulated over decades, all looking at your code trying to make it fast. The resulting program can run much faster, but will require more work to learn and write.
Prefer general and extensible languages Open-source languages like R, python, and perl are freely available and develop faster than proprietary software such as Matlab because scientists can contribute their packages directly. Many languages have wrappers that let them read code written in a different language. This way you can utilize someone's fast C or Fortran function inside your own language. R is particularly good at this.
All that said, R makes a pretty good choice. It has a higher learning curve and less polish than other choices, but you get access to more power and cutting edge stuff like easy parallel computing, working with GIS, or using regular expressions. If you've heard about some cool direction computing is going (say, computing on graphics processors), chances are that someone is making it work in R.
Good Programming Practice
There are a couple skills that will help you learn and use any programming language, but are rarely taught in biology software tutorials. These tools and skills will save you time and tears in the future.
Ever try to recreate a graph or statistical analysis you performed last week but can't remember exactly what you did? Ever have a folder with tons of files in it labeled myfile_version1.txt myfile_version2.txt? Ever modify a file and then wish you could revert to the version you had a couple hours ago? How about working on the same file with multiple collaborators all making changes? What you need is version management. It just might change your life.
Subversion is a great place to start.
Write like a programmer
Write scripts. Rather than just entering commands line by line into the command window, write all your commands down in a script. This gives you something to come back to if you want to see what you did or repeat an analysis.
Write functions. A function executes a series of commands to accomplish a task. Learning to write functions is an essential concept for good programming; as it breaks your code into smaller, more manageable and reausable chunks. If your command is going to take more than a couple lines, consider making it a function. If your function is going to take more than a screen, consider making it two functions.
Follow style guidelines. There's lots of freedom in how you write and organizing code, just like organizing a paper. Following standard style guidelines will make it easier for others to understand your code. Google uses R and provides an excellent set of style guidelines.
Think like a programmer
Object Oriented Design
R Textbook (free) While many R texts aim to teach biologists how to use certain packages, few teach you the basic logic of R. Understand that and it becomes much easier to learn and use any package. —Carl
Note: You must be logged in to add comments