14th Discussion-7 November 2013

From BioWiki
Jump to navigationJump to search

Brief Description and Continuing Discussion:

Topic 1: How to select and evaluate new software packages (Led by TBC)

Introduction and Preliminary Information

As core facilities one of our day to day tasks is to keep abreast of developments within the bioinformatics community and always be on the lookout for new pieces of software which might be of use for the analyses we perform. Additionally many of us are (or maybe should be) involved in the purchase of commercial software, either stand alone analysis packages, data management systems or software which comes with large hardware purchases.

In this session we'd like to look at how different groups handle the evaluation and introduction of new software packages. Topics we will try to address will include:

  • How do you discover new software packages to investigate?
  • If you're searching for a specific type of software how do you quickly get a list of candidates or judge potential suitability. Whose opinions do you trust?
  • What criteria do you include when evaluating new software packages and how do you weight them. Are there any red lines which would immediately cause you to reject a package?
  • Do you, as a matter of course, consider commercial as well as open source packages? Do you judge these differently? What would sway you to opt for a commercial package over an open source alternative?
  • When evaluating a package for a new type of data do you run example data sets? If so where do you get these and how do you judge the results?
  • Are you consulted on the software side of any large hardware purchases (microscopes, mass specs, sequencers etc)? Has the software side of a hardware package influenced a decision on which hardware to purchase? Have you encountered problems with the software packages for large hardware purchases which could have been spotted if it was evaluated during the purchase period?

Notes from the Call

David introduced the topic and started by saying that their software evaluations are usually driven by requests from scientists. They commonly find that the reality of using many of the requested packages doesn't live up to the promises made in the papers which describe them and that it doesn't work in the way they expected.

Steven Turner commented on his search for packages to predict gene fusions. He has been disappointed with the quality of many of the packages he's tried. He pointed out that there is a difference between software suitable for a particular research project and what is suitable to put into production use. Core software has to work for people other than the original authors and that scientists need to have their expectations in this area managed to make them appreciate this. It's important to have stable software which has been vetted by the community. He has found packages in the past which worked and seemed useful but were ultimately poorly supported and maintained by their authors so he now focuses on better maintained packages.

Simon said that they commonly see scientists turning up wanting to exactly replicate the methods in a specific paper which would entail bringing in new software packages. Jim said that their approach was that they would put in 8-16 hours of effort to try to make a new package work and evaluate the results, but that scientists would be charged for this. Simon said that he would also install software people requested so that they can try it for themselves, since this is often quick, but that proper evaluation is slower and more costly.

Fran raised the problem of commercial packages and how to handle licensing within core facilities. Site licences make sense in terms of the overall finances of the institution, but can be problematic if they're assigned to the budgets of a core group. Difficult to reclaim them from the groups who use them.

Hemant said that his group do support a lot of commercial pacakages. This works better the more PIs you have since personal licenses get increasingly unaffordable with larger groups and their is an obvious advantage to centralised licenses. His group put out surveys to guage the interest in a new package before committing to a site license but they leave the decision about the scientific utility of the package up to the individual PIs.

We next moved on to discuss how people evaluate the software they use. It was noted that there are two types of evaluation, an initial practical evaluation which covers whether the software is available, can be installed and made to run and whether it completes in a reasonable amount of time. The second stage is a scientific evaluation where you have to decide whether the results are scientifically useful. This second stage is much harder.

Matt talked about an evaluation study his group did when moving from arrays to RNA-Seq for expression evaluation. They did an extensive evaluation comparing initial RNA-Seq data with previous array studies for the sample samples to look at reproducibility. They also went further to look at the effects of read lengths, read depth and single vs paired end to work out the optimal configuration for their assays. This was a lot of work and isn't feasible where you don't have something to compare to but this kind of data is really valuable for the community. Matt's group have not published this data although they have talked about it at meetings (see our previous discussion about the difficulties of publishing from within a core group).

Fran and George mentioned that they have done bake offs of different datasets on new software packages to see how these work across a range of well understood data.

It was pointed out that in addition to the problem of evaluating new software there is also a problem when looking at upgrading existing software. Specific problems were mentioned with regards to tophat, DESeq and Cufflinks in the past where upgrades to the software have had dramatic effects on the results produced in some datasets. It is often difficult to be able to decide whether the changes are for the better or worse since we rarely know the correct answer in our datasets. Simon said that his groups general view was that upgrades generally fixed more problems than they solved so they didn't routinely test updated versions before upgrading, but they did keep multiple versions of software on their cluster so that users didn't have to change versions in the middle of a project.

There was a suggestion that the core group might be a suitable community to try to hold common evaluations of different packages. Whilst this is an attractive idea the practicalities of hosting and organising this sort of evaluation across a wide range of packages and data types is daunting. It was pointed out that to some extent this sort of community moderation is already done by sites such as SeqAnswers.

A related questions which was raised was that when we're evaluating software do we really care that the scientific merits of the software are the absolute best of breed, or would a slightly worse scientific performance be offset by ease or speed of use or the quality of the support available. Few people use the fullest extent of the capabilities of many packages and predictions at the limits of sensitivity are often not followed up so it often makes more sense to weight other practical factors more highly when selecting software.

Jim asked whether people talk to the developers of software packages when evaluating software. Matt said that his group often talk to the package developers to clarify how the packages are supposed to run and to make suggestions for improvements. They take into account the responsiveness of the developers when deciding which packages to use. Simon asked whether people who review software articles routinely comment on the usability of the software rather than just the algorithmic side of the software since this should be an important consideration when looking at the utility of a new package.

At this point the session ran out of time so we moved on to the next topic.

Topic 2: Open Forum (Led by TBC)

In previous calls we've aimed to have two discussion topics but we're going to experiment with changing this to a single defined topic followed by an open forum where anyone can raise a question in which they're interested. This is intended to be used either for short questions (gauging interest in a topic, asking for quick opinions, addressing a specific issue etc), or to kick of a longer discussion which can continue on the mailing list or be the subject of a future call.

Questions for this session can be asked in advance by either posting them to the mailing list or by adding them into this wiki page. You can also ask questions directly on the call if you prefer.