Set Up Preprocessing for ChIP-Seq and miRNA Data

If you are following a ChIP-Seq or miRNA workflow, the Set Up Preprocessing step of the Project Setup Wizard allows you to select a normalization method, define the genes for your genome and change the settings for peak discovery.

 

 

Choose from the following preprocessing options for ChIP-Seq and miRNA data:

 

      Processing Method – This has an unchangeable default of QSeq.

 

      Normalization method – Choose the desired normalization method to be applied to your data. A brief description of the selected method will appear underneath the selection. Options include RPM and None, the latter of which will process your data without normalization.

 

      Click Add File to load the template sequences that your reads will be mapped to. Templates may be a single sequence, or group of sequences, such as a set of contigs. Use Download to launch the Download Genome Reference and download your template sequences directly from NCBI. If desired, use the Remove button to clear the selected sequence, or the Remove All button to clear all of the template files and begin again.

 

      Use Features of Type – Check this box if you wish to define annotated features in the template sequence(s) as genes. This allows ArrayStar to associate located peaks with upstream and downstream genes for analysis after preprocessing. You may select a feature type from the dropdown list provided, or you may enter multiple feature types in this field by typing them in and separating each with a comma. The total number and length of features, based upon the type you have specified, are listed to the right of this option and will update as the selection changes. The available feature types will depend on what annotations are included in the template sequence(s). If your template sequences do not contain features, this option will be disabled.

 

To define where in the genome QSeq will search for peaks, choose from the following Genome Filtering options:

 

      Discover peaks in the entire genome – QSeq will search for peaks along the entire template sequence(s). The total number of template sequences you have loaded, as well as total length of the sequences, are noted to the right of this option.

 

      Discover peaks only near known binding sites – QSeq will only search for peaks in regions of the template sequence(s) surrounding the binding sites defined in the Create Binding Proteins Dialog. You can modify the size of these regions by changing the Extend Regions by value. This value defines the number of bases on either end of regions matching the binding site. If you did not define binding sites for your binding proteins, this option will be inactive.

 

      Configure Advanced Options – Click this button if you would like to further adjust the processing parameters or to export a graph, alignment, or file of unassigned reads.

 

Click Back to return to the Add Experiments to Import step of the Project Setup Wizard; Next to process your data and proceed to the next step of the wizard; or Cancel to close the Project Setup Wizard without adding any data to the project.