Perl or Python for bioinformatics

Uncertain · August 11, 2011, 1:09pm

Yes, language wars can get ugly and pointless. That said, I’ll give you my opinion.

I find Python to be the vastly superior language, unless you’re just looking for a better awk/sed/shell. The reasons for such preferences are often difficult to pinpoint or articulate. Perl has unnatural syntax for things like functions with arguments. Lots of dubious things happen automatically and silently in Perl, such as unset values or strings like “foo” behaving as 0 when you try to use them as integers. Classes in Perl are a hideous afterthought. Interactive use, a big plus of such languages, has been much nicer in Python in my experience, but I don’t see why that couldn’t be easily improved, and I suspect that somebody has done so.

Unfortunately (in my opinion), Perl has for historical reasons been common for biologists who want to do some “bioinformatics”. Python seems to have largely caught up.

Uncertain · August 11, 2011, 1:18pm

That seems like, at best, an overstatement. People chose to use it for some essential tasks, which might or might not have been a good choice. The same holds for C, Intel microprocessors, Linux, capillary electrophoresis, and pipette tips. Were all of these “largely responsible for the success of the Human Genome Project”? If we have to pick one (since only one can be "largely responsible), I would have to go with either capillary electrophoresis or pipette tips as the least dispensable.

Francis_Vaughan · August 11, 2011, 1:38pm

Probably the main enabling computer technology to the human genome project and much other bioinformaitcs was the BLAST algorithm and fast implementations thereof. Certianly back in the 90’s computers were a whole lot slower than now. OTOH, data marshalling and post processing needs meant that a language like Perl was a natural.

BillJJ · August 19, 2011, 7:35am

Thank you for all the replies. This is exactly what I was looking for and I appreciate all of the comments. Instead of quoting each reply and responding, I’ll try to address all of them at once.

The summary of the different levels of coding and size of the task is helpful. I assume the reason perl has become popular with many bioinformaticists is that many, maybe the majority, of the tasks are quite small and can be accomplished at the scripting level, which apparently can be accomplished with perl. The ability to tailor the specific script quickly to match the specifics of what you’re working with is a big plus. Also, the well established foundation of bioperl must also be a large reason that many continue to use perl. Many biological researchers looking to implement computing tools into their research would understandably look for the quickest, most well established, route since they themselves are not programing experts.

There are, of course, also much larger computing tasks important for bioinformatics, and the group of people working on these is much smaller than the whole of the “bioinformatics” community. This is probably a result of the still large gap between the disciplines of biology and computing. BLAST, as mentioned above, is a good example. It’s an amazing tool and extremely heavily relied upon in the world of biological research. However, the tradeoff of giving up accuracy for speed is an issue of concern if people become too complacent and end up settling for the current technology for too long. Almost every other sequence alignment program available for smaller data sets produces much more accurate results regarding the alignment. Some clearly better than others. BLAST is great (and currently the only option that I know of) for searching enormous databases and provides a great way for researchers to locate a good starting point. However, a tendency to rely too heavily on BLAST results can be problematic. This is a bit off-topic from my original question though. I don’t even dream of developing a system that out-performs BLAST. This would be a task for a team of hard core programmers who also appreciate the importance of biological research and have a real passion for both fields. Hopefully that demographic will grow in the near future. And BLAST is only one example, as bioinformatics is still in its relative infancy.

Really, I think that the specific programming language that has contributed the most to biology thus far is not a great indicator of the best one to use. It’s clearly good to develop a working knowledge of all of the possibilities, so that I can eventually work with all of you on my upcoming projects though. And I think this is the best route for the typical bioinformatics person looking to be able to contribute as much as possible within the parameters set by those with a much stronger passion for the programming aspect.

So here’s what I’ve decided to learn first (subject to change of course). Perl and python (at the same time) since I’ll need to work with those with a passion for each. Followed by ruby, R and either C# or Java (at this point I can’t even begin to make a distinction between these two).

Thanks for all of the input.

Topic		Replies	Views
Perl help Factual Questions	75	2004	October 22, 2018
Programming alternatives, Part 2 Factual Questions	113	6897	October 8, 2018
The Best Programming Language Ever In My Humble Opinion	89	10252	October 22, 2009
What is Python (prog lang) used for? Factual Questions	32	4803	November 23, 2015
Programming alternative Factual Questions	41	12029	May 2, 2010

Perl or Python for bioinformatics

Related topics