An effective as well as efficient experimental design is important for deriving statistically sound conclusions from miRNA array profiling data at minimum cost. An experimental design is to determine how many biologically categorized groups of samples will be used in a project, how many biological repeats (samples) will be included in each group, and how many technical repeats will be used for each sample. Design rules are made based on the understandings of data analysis of differential expressions and rankings of various experimental variations. In a differential expression study involving two sample groups, statistical significance is determined by comparing average expression level difference between the two groups to expression level variations within the two groups. The degree of significance of a specific miRNA sequence can be mathematically assessed in a T-test using T function where:
The higher the T function value, the higher the probability of groups A and B being statistically different. A corresponding p-value may be derived using t-distribution table (available through computer programs, e.g. R and Excel). Equation 1 reveals two mathematical facts. First, a statistic test can be performed only when each group contains two or more samples. Second, accurate assessments of standard deviations among samples within each group are as important as an assessment of differences between the groups. While the first fact is obvious, the second fact is not fully recognized by many biological experimentalists.
The standard deviations in Equation 1 reflect cumulative results of various experimental variations, which are generally ranked as following:
- biological variations among biological repeats >
- run-to-run variations due to RNA extraction >
- labeling dye induced variations or biases ≥
- run to run variations due to sample labeling >
- chip-to-chip variations due to chip fabrication ≈
- chip-to-chip variations due to hybridization >
- spot-to-spot variations within a chip
This ranking is valid for assays using in-situ synthesized array chips (e.g. the ones from LC Sciences). Biological variations among biological repeats of various types of samples are generally ranked as following:
human samples > lab animal samples > cell lines
Based on the above experimental variation rankings and the analysis of large number of experimental data (not presented here), we suggest the following rules as general guidance for experimental designs:
Biological repeats versus technical repeats
For most applications biological repeats are used and technical repeats are not needed. Biological repeats are defined as samples derived from specimens of individual human or animal subjects or individual cell line growth batches of the same testing group. Technical repeats are defined as samples originated from a single biological specimen but are separately processed at certain step of assay process. In case of using in situ synthesized array chips, assay induced variations are generally less than biological-repeat variations. The use of technical repeats alone will not be able to count the biological variations within the same groups, underestimate standard deviations in Equation 1, and therefore lead to false positive calls.
Number of samples per group
Human: 10 or more
Lab animal: 3 or more
Cell line: 3
Pooling samples before performing array assay has been used by some scientists to reduce array assay cost while by others to make up limited sample amount. When samples are pooled, critical information on sample-to-sample variations ( and in Equation 1) within the same groups are lost and therefore identification of biologically significant miRNA differentials may no longer be possible. Therefore, in general the use of pooled samples is not recommended. However, if pooling samples have to be used due to limited sample amount from individual specimens, the number of samples in each pool should be minimized, pooled specimens should be selected among close biological resemblances, and multiple pools should be used as substitutes of biological repeats.
Single-sample assay versus dual-sample assay
In single-sample assays, all samples are labeled with the same color of dye and each labeled sample is hybridized to one chip (or to one sub-array of a chip). In a dual-sample assay, one sample is labeled with Cy3 (or Alexa Fluor 546, Oyster-550, or any other equivalent dye); the other sample is labeled with Cy5 (or Alexa Fluor 647, Oyster-650, or any other equivalent dye); and the two samples are combined and co-hybridized to one chip (or to one sub-array of a chip). The main advantage of single-sample assays is that they are free of dye related bias, and the method is good for all applications.
More information about LC Sciences miRNA profiling, discovery and analysis services is available at: http://www.lcsciences.com/mirna_discovery.html.