ChIP-Seq analysis

QSeq’s ChIP-Seq workflow enables you to locate the binding sites of DNA-associated proteins and determine how these proteins interact with the DNA to affect expression in nearby genes. To use QSeq for ChIP-Seq analysis, choose File > Import Experiments > ChIP-Seq, and then select your ChIP-Seq files to import during the Add Experiments to Import step of the Project Setup Wizard. (For a list of supported file formats, see Supported File Types). The wizard will prompt you to define binding proteins and binding sites if you know them. The wizard will also guide you through the process of loading or downloading your reference sequences, also called templates, during the Set Up Preprocessing step , as well as give you access for defining processing parameters.

 

By default, each ChIP-Seq file that is imported is treated as its own experiment and is processed individually. Alternatively, files can be merged, allowing the sequence data in multiple files to be treated as a single experiment. During processing, reads are mapped to a target, such as a genomic template. QSeq then looks for peaks, or regions where more reads than expected map to the target.

 

QSeq uses mers, defined by a sliding window of a specified number of bases (default is 15) to determine where reads are mapped to the template. Several options are available for determining the minimum requirements necessary for a read to be considered as a match to a template. QSeq uses one of three peak detection algorithms to determine where the peaks are located based on the read mapping. You may adjust these parameters, and others, in the QSeq Advanced Options dialog. You may choose to have QSeq look for peaks along the entire genome for peak discovery. Alternatively, you can limit peak discovery to regions near binding sites, which are defined in the Create Binding Proteins dialog of the Project Setup Wizard.

 

Once the reads have been processed, signal values are created based upon the number of reads within each peak. By default, ArrayStar uses the log2 of these values for all visualizations and calculations. QSeq also calculates the start and end positions of each peak along the genome as well as the maximum height (in reads) of each peak. Depending upon what Peak Detection algorithm you use, QSeq may also calculate a P-value for each peak.

 

Note: QSeq uses temporary files during the process of generating and matching mers from individual reads to the template sequences. These temporary files are stored in the directory specified under Edit > Preferences. Since processing can require a significant amount of disk space, it may be helpful to use an external hard drive for very large projects and edit the temporary directory listed under Edit > Preferences to point to the external hard drive location.