The anion channelrhodopsins from the algae Guillardia theta are highly effective at inhibiting neuronal activity in flies. Our manuscript is now available as a preprint.
Why do I need to do this?
A lot of biology involves (1) doing an experiment where you make an intervention and (2) measuring what effect the intervention has.
If you are living in the Dark Age of P, you assume your intervention has had absolutely no effect, and then calculate the probability of seeing your data (or more extreme data), under that assumption of zero effect. Since you likely did the experiment because you thought there would be an effect, it seems super weird and slightly depressing to then go assuming zero effect when you start analyzing your data.
If you use estimation, a far more sensible method, you want to estimate the size of the effect of your intervention. The most straightforward effect size is the difference between the control mean and the intervention mean ('mean difference'), along with its confidence interval.
I've got confidence interval error bars, is that enough?
It's a good start to have CI error bars on your observed data plots. However, you also need to be able to say things like:
"when we made the intervention to flies their behavior X
increased by +54% [95CI +35, +78], P = 0.01."
You need (1) know what your delta variable is and (2) know how to get these numbers.
What statistics are essential?
The three essential statistics are:
What statistics are nice to have?
What text style should I follow?
The text format we are using for mean difference with its confidence interval is:
-1.5 [95CI -1.2, -1.8]
+1.5 [95CI +1.2, +1.8]
Note the use of the +/- signs to denote that this is a measure of the change in the variable, not a measure of the variable itself. I prefer '95CI' instead of '95%CI' as I find the latter to be cluttered.
Often it is useful to write something like this.
∆weight = +1.5 µg [95CI +1.2, +1.8]
to remind the reader of the change variable (∆weight) and the units (µg). The confidence interval bounds are contained within square brackets , and this generally follows the mean difference closely in text. It is then followed by the supporting statistics. For example:
∆VO2 = -1.5 µl/fly/min [95CI -1.2, -1.9], N = 65, 62
∆VO2 = -1.5 µl/fly/min [95CI -1.2, -1.9], g = 0.56, P = 0.01, N = 65, 62
The g statistic was invented by Larry Hedges, so Hedges' g uses a possessive apostrophe after the 's.' The 'g' is italicized. P should be italicized and capitalized by default.
What graphical style should I follow in charts?
Instead of the little stars, you can put the effect size right next to your difference marker. In this case, simplicity is a virtue. You can write '∆ = -1.5' next to the first marker, then just the numbers without the '∆ =' for the rest of the markers in that Figure.
When using Google Docs, there are a few tricks.
How might a fly model of anxiety help us understand the genetic causes of anxiety disorders? Read the paper (PDF).
Back in January, Nature Methods ran our letter that defined and explained 'estimation statistics.' Since journal policy allows posting an e-print of the accepted manuscript 6 months after publication, here is the manuscript version of "Estimation should replace significance testing."
This e-print is also available at Zenodo, http://dx.doi.org/10.5281/zenodo.60156.
Some notes on gene and protein nomenclature, based on this extensive guide here.
Using Illustrator to make figures give nice results, but is a pain to collate into a single pdf repeatedly during the drafting process, requiring the use of Adobe Acrobat Pro's Create PDF > Merge Files into a Single PDF, clicking "Add Files", then selecting the files you want to collate into a pdf.
A faster way is to use pdftk server, a command line tool that is made for pdfs, but seems to work fine for .ai files also. From the examples given on the pdftk page, doing this is as simple as launching Terminal and typing the relevant version of this:
$ cd yourdirectory
$ pdftk input1.ai input2.ai cat output new.pdf
The resulting pdf files still ends up pretty big (as large as doing the same operation in Acrobat), so one still has to reduce file size in Preview (File > Export... > Quartz Filter: Reduce File Size, click Save).
You can download and install pdftk server here.
For repeated collation, you can save your command lines into a text file for later use.
*Note that you can find yourdirectory by using the Finder to find one of the relevant files and pressing Command-I to get information. Then highlight and copy the path information under "Where" in the information window.
Here is a guide to installing Python for data analysis on a Mac, along with a few extra tips to get going.
The Anaconda distribution is a very convenient version of scientific Python that installs a lot of modules as well as a Launcher that offers three GUI apps:
One of the great things about Anaconda is the ease of updating everything. On one hand, most of what we do doesn't need updating, but on the other the whole SciPy ecosystem is evolving so quickly that it seems silly to update less than twice a year. The GUI apps can be updated by pressing buttons in the Launcher, while Anaconda as a whole can be updated with two lines.
As shown here, you just need to open the command line (e.g. Terminal on Mac) and type
$ conda update conda
$ conda update anaconda
The first line will update the conda package manager, the second line will use conda to update all the modules in Anaconda.
Update: Some packages that are not in Anaconda will require installation with pip. To install pip, follow these instructions here. If you have pip already installed you can update pip:
$pip install --upgrade pip
and then use it to install a bootstrap package:
$pip install scikits.bootstrap
This bootstrap package can then used as per this tutorial.