In the Define Binding Proteins screen, selecting JASPAR (PWM) from the Binding site type menu lets you use the JASPAR position weight matrix to locate binding sites for eukaryotic organisms. (For prokaryotic organisms, instead use the Transcription Factor Database.)
If you choose this option, SeqMan NGen will calculate the log-odds for each sequence given the selected matrix. The score for a single character at a particular position in the matrix is equal to the log2 of the likelihood of seeing that character at that position in the data used to generate the matrix divided by the background likelihood of seeing that character at that position.
For example, if the matrix is derived from 80 sequences and in 70 of those sequences there is an “A” in position 1, the log odds score of seeing the character “A” in position 1 is log2((70/80)/(20/80)) = 1.80. If a “C” occurs 1 time in position 1 of the training sequences, the log odds score of seeing the character “C” in position 1 is log2((1/80)/(20/80)) = -4.32. To get the log odds score for the whole sequence, SeqMan NGen sums the log odds scores of each character in the sequence.
A sequence is considered to "match" the matrix if its score is greater than or equal to the specified Threshold. By default the threshold value is half of the average of the log-odds scores of sequences that were used to train the pattern. You can increase the threshold for more stringency or decrease it for more matches.
When you choose JASPAR (PWM), the dialog is immediately updated to appear as below:
- Select the Organism from the drop-down menu.
- Type a name into the Binding Protein Label text box.
- Click the Select button to choose the site/factor name from a list. Make a selection and choose OK. The remaining fields will be filled in automatically, and the sequence logo for the position weight matrix will be displayed below.
The colors used for the sequence logo bases differ from those used on the JASPAR website. The sequence logo colors used in both SeqMan NGen and MegAlign Pro have been designed for maximum legibility, including by those with difficulty differentiating between red and green.
If you wish to view the PubMed or UniProt entries for the selected site/factor, click on the corresponding link.
- Click Next to proceed to the next wizard screen.
Once you initiate sequence assembly, all detected peaks will be scanned for the presence of sites that pass the JASPAR scoring threshold.
Need more help with this?