In Part A of this tutorial, you ran an assembly and launched the results in ArrayStar. In this part, you will use the ArrayStar Gene Table to locate potential duplications in the reference sequence.

Imagine that you would like to find a region that is repeated in the reference, but present only once in the MG1655 sample. In the Gene Table these regions will have a linear weighted RPKM-CN of approximately 0.5.

  1. Click on the Gene Table tab near the top of the ArrayStar window. The footer shows that the Gene Table contains 4,283 genes.

  1. Click on the header CNV (2) – linear weighted RPKM-CN to sort the column from small to large values
  1. Drag the mouse to select all table rows with a linear weighted RPKM-CN from 0.4-0.6, inclusive.
  1. Right-click anywhere on the highlighted area and choose Select and Remember as a Gene Set.
  1. Type in the name Duplicated in reference, and then press OK.
  1. Return to the Gene Table by clicking its tab near the top of the ArrayStar window.
  1. Without disturbing the selection, click on the Add/Manage Columns tool (). Select Target Range and press the >Add Column> button, then click OK.
  1. Order the genes by ascending location by clicking on the Target Range column header.
  1. Use the vertical scrollbar to scroll down to the large block of selected rows beginning around Target Range 515000.

This area marks a possible duplication in the DH10B reference sequence.

  1. From the Choose Quick Filter menu () choose Show Only Gene Set. Then select Duplicated in reference and press OK.

The Gene Table now contains only the ~230 putative duplicated genes.

  1. Use File > Save Project to save updates to the project.

Proceed to Part C: Confirming the duplication using GenVision Pro.


Note: The following brief video is not part of this tutorial, but also explores copy-number variation by loading SeqMan NGen assemblies into ArrayStar:

Need more help with this?

Thanks for your feedback.