ArrayStar
Data Analysis
ArrayStar provides users with several different methods for data analysis using statistical methods. Depending on the experiment and the type of information sought, different methods may be applied by the user.
A. Probabilistic Statistical Analysis Methods
To use these statistical tools replicate samples are required. Variability is measured within the replicates. From the variability, confidence scores that are generated can be used to reflect differential gene expression. Methods available to users are:
Student t-test Moderated t-test F-test (ANOVA)
After selection, ArrayStar calculates a P Value and a T/F Value for each gene. In general, if the T/F value is large, then the assumption can be made that the gene is differentially expressed.
The P value represents the probability that the calculated T/F value occurred by chance. In general, the lower the P value, the more confident you can be that the gene is differentially expressed.
In addition to the probabilistic statistical analyses listed above, the following general statistics are also available in ArrayStar:
Coefficient of Variation Standard Deviation Variance
B. Multiple Testing Corrections
Statistical tests like the Student’s t-Test, F-Test (ANOVA) and Moderated t-Test are used to identify differentially expressed genes. However, often with a large dataset, it’s possible to have a significant group of false positives.
For example, a t-Test can be applied on a group of genes and those which have a p-value less than a certain value (0.05, for example) can be chosen as differentially expressed. However, when the test is performed on a large number of genes (order of 10,000), a significant number of genes (~500) that are not actually differentially expressed will have a p-value lower than the set threshold and thus will be selected as differentially expressed. These genes are false positives, and this issue is referred to as the Multiple Testing problem.
Various adjustments can be made to the p-values with the objective of reducing the number of false positives. The adjustments available in ArrayStar are listed below, and can be applied to the p-values for any of the probabilistic statistical tests in ArrayStar.
Bonferroni - In the Bonferroni method, the p-values for each gene are multiplied by N, where N is the total number of genes being tested. This increases the p-values to such a level, that very few genes are selected within the threshold.
The Bonferroni method is highly conservative and while it reduces the number of false positives greatly, a number of truly differentially expressed genes are excluded. The Bonferroni method may be best utilized when looking for a small number of genes which are highly differentially expressed.
Holm-Bonferroni - Using the Holm-Bonferroni method, the p-values are first sorted and then the smallest value is multiplied by N, where N is the total number of genes being tested. The next value is then multiplied by N-1 and so on, so that the last p-value is multiplied by 1.
This method is not as conservative as the Bonferroni method, but may still exclude many potentially interesting genes (false negatives). As with the Bonferroni method, the Holm-Bonferroni method may be best utilized when looking for a small number of genes for further experiments which are highly differentially expressed. In other words, this method can be effective when the goal is to just eliminate false positives even if it is at the cost of a number of false negatives.
FDR (Benjamini Hochberg) - The FDR (Benjamini Hochberg) method is the default P-value adjustment method in ArrayStar. In this method, the p-values are first sorted and ranked. The smallest value gets rank 1, the second rank 2, and the largest gets rank N. Then, each p-value is multiplied by N and divided by its assigned rank to give the adjusted p-values.
In order to restrict the false discovery rate to (say) 0.05, all the genes with adjusted p-values less than 0.05 are selected. This method aims to reduce what is called the False Discovery Rate (FDR) and is used when the objective is to reduce the number of false positives and to increase the chances of identifying all the differentially expressed genes.
C. Filtering
The Filtering capability of ArrayStar permits users to modify gene searches in a number of different ways. Criteria that can be used include:Fold Change Gene Annotations Expression Levels Statistics
Fold-change analysis is a simple method used to identify genes with expression ratios or differences between a treatment and a control that are outside of a given cutoff or threshold. ArrayStar permits, in its Filtering mode, searches to be conducted on Fold Change levels that are determined by the user. In the Scatter Plot image, a range of pre-set Fold Change levels are provided.
Gene Annotation permits filtering based on the selected genes that have annotations entered. Expression Levels and Statistics permit users to define filter criteria in each for the search.