Scatter Plot

To access the Scatter Plot view, choose Graphs > Scatter Plot from the menu.

The Scatter Plot view is composed of several elements. The graphical display on the left is discussed in this topic. The toolbar tools above the graphical display and the Info Pane on the right are described in separate topics.

ScreenHunter_01 Jul

In the Variants workflow, the Scatter Plot gives a visual comparison of gene level variation between any two samples. Each data point on the Scatter Plot represents an individual gene and the “signal” for gene is the sum of the weighted values for each class of variation: each synonymous SNP adds 1 to the signal, each non-synonymous SNP adds 100 and each nonsense or frameshift causing SNP adds 10,000. Values are halved where the change is heterozygous.

For non-Variants workflows, ArrayStar’s Scatter Plot view gives a visual comparison of gene signals between any two datasets; whether they are individual arrays or replicated sets. Each data point on the Scatter Plot represents an individual gene and is plotted based on its signal in both of the selected experiments.

If DESeq2 or edgeR normalization was used, data are visualized in the scatter plot using the rlog (log₂) value for each gene. For all other workflows, data are visualized on a simple log₂ scale. However, data can also be scaled and visualized as linear by changing the scatter plot style settings. In general, a log₂ scale can facilitate visualizing data that covers a large span of values by reducing it to a more manageable range.

Three solid green lines are drawn diagonally across the scatter plot. The middle green line is the identity line, or the x=y line.

• For non-Variants workflows, data points on the identity line represent genes that are expressed at the same level in both data sets. The other two lines delineate genes with at least a two-fold change in intensity value in one of the data sets.

• For the Variants workflow, points lying on the identity line indicate those genes in both samples that have the same total number of SNPs, with the same class distribution. The other two lines delineate genes where the weighted sums vary more than two-fold.

The dashed purple line on the scatter plot is the linear regression (R²) or “best-fit” line, a line that passes as near to as many data points as possible. The Scatter Plot only displays the value for the currently-selected pair of experiments in the view. By contrast, the Cross-R2 test creates all of the "crosses," or pairs, and produces a table. The R² results for both the Scatter Plot and Cross- R²test use the same statistical calculations.

Each data point is colored to reflect where it is in comparison to the x=y line. The colors for data points, as well as the fold lines and regression line can be changed to match your preferences.

Right-click on the Scatter Plot to access the following context menu commands:

When this many genes are selected	… these commands are available	Command description
≥ 0 genes	Show Gene Table	Shortcut to Data > Show Gene Table.
	Select All	Shortcut to Edit > Select All.
	Copy Image	Shortcut to Edit > Copy Image of Graph or to the Copy Image tool ().
	Fit to View	Same as using the Fit to View tool ().
≥ 1 gene	Remember Gene Set	Shortcut to Data > Remember Selected [Genes, SNPs, Table Rows, etc.].
	Clear Selection	Shortcut to Edit > Clear Selection.
	Export	Shortcut to File > Export Selected Genes.
Exactly1 gene	Edit Gene ‘x’ Notes	Launches the Edit Gene Notes dialog, already populated with the gene name. Equivalent to clicking on the “Notes (Edit)” link in the Details panel.
Exactly1 gene	Remove Gene ‘x” from Selection	Deselects the gene.