Thursday 12 December 2013

Seven tips for bio-statistical analysis of gene expression data

Seven tips for bio-statistical analysis of gene expression data

Author: 
Jo Vandesompele, Biogazelle

Many scientists have a hate-love relationship with statistics. Personally, I didn’t like statistics (at all) during my masters degree education [1]. Too theoretical, didn’t see the utility of it. Only when I generated my first data during my PhD research, I started realizing the necessity and power of bio-statistics. Later, I almost really fell in love with statistics after reading 
Intuitive biostatistics by Harvey Motulsky. This excellent book is written by an author who graduated from medical school; this probably explains why it contains only the most pertinent formulas. I particularly appreciate the book as it really is intuitive; it almost reads like a novel, and you could read it in bed, next to the fireplace with a glass of your favorite wine, or even when you’re on holidays. If you always felt the need to sharpen your basic bio-statistics skills, then this book may be really something for you.
In September of this year, Nature Methods has initiated a new column 'Points of Significance’ devoted to statistics. The corresponding article on Nature Method’s methagora blog holds a continuously updated list of the Points of Significance articles. I find these very valuable.
Obviously, this blog does not aim to serve as a crash course on statistics. Instead, I would like to focus on seven very simple but fundamental principles for doing bio-statistics yourself, especially when you’re handling gene expression data. Some of this information is also available in a book chapter that Jan Hellemans (the other Biogazelle co-founder) and I wrote for "PCR Troubleshooting and Optimization: The Essential Guide” (Caister Academic Press, 2011, ISBN: 978-1-904455-72-1). The full text pre-print of the book chapter qPCR data analysis – unlocking the secret to successful results is available in Biogazelle’s Knowledge center section under Publications (require free login to a MyBiogazelle account).
1. Always log transform your gene expression data
2. Consider pairing
3. Choose a proper statistical test before you start your analysis
4. Don’t underestimate the value of confidence intervals
5. Correction of the P-value is needed when testing multiple hypotheses
6. Independent biological replicates are required
7. When in doubt, consult an expert bio-statistician

No comments:

Post a Comment