Create a reference-guided assembly to use in the “SNP to Structure” workflow - User Guide to SeqMan NGen

If you are working with reference-guided human assemblies, Lasergene’s “SNP to Structure” workflow lets you combine genomic sequencing and variant level data with structure files from the RCSB Protein Data Bank (PDB) to model point mutations on the protein structure and assess the effect on protein stability. By combining structural bioinformatics with sequencing technologies, this integrated workflow can guide genomic and molecular biology researchers to create structure-based hypotheses and to investigate possibilities not evident by sequences alone.

This workflow requires that you be licensed to use several Lasergene applications: SeqMan NGen, SeqMan Pro / SeqMan Ultra and Protean 3D (all required), and ArrayStar (optional).

Only Part A of the workflow involves SeqMan NGen. However, all parts of the workflow are described below.

Part A: Create a reference-guided assembly in SeqMan NGen:

In SeqMan NGen’s Workflow screen, choose a reference-guided workflow.

In the Reference Sequence screen, add the DNASTAR genome template Homo_sapiens-GRCh38-Ensembl-dbSNP150.zip_. This template contains the mapping of the sequences of PDB structures to the human genomic coordinates, and will later allow SeqMan Pro to communicate with Protean 3D. SeqMan NGen outputs an .astar and an .assembly package to use in later steps.

Follow the rest of the wizard steps and create the assembly.

Part B (optional): Filter for variants of interest in ArrayStar:

If your assembly has a large number of variants, you can use ArrayStar to filter them down to a smaller group of interest before sending them to SeqMan Pro for viewing. This step is highly recommended for all but very small assemblies.

Launch ArrayStar and choose Open a project.

Navigate to and select the .astar file output by SeqMan NGen.

When the file has loaded, click on the SNP Table tab.

Use Filter > Filter All to perform any desired filtering.

At the conclusion of filtering, click the Remember Results as a Variant Set tool () above the Search Results table. Type in any desired name and press OK.

In the Action section in the center-right of the window, click the link Select and show the table of this set’s Variants.

Use the Choose Quick-Filter drop-down to select Show Only Variant Set. In the ensuing dialog box, select the named set and click OK.

Click on Add/Manage Columns. Select SeqMan NGen Assembly Variants, then pdbID. Click Add Column and OK.

In the SNP Table, click on the pdbID column header to sort the column and locate rows with PDB entries.

Within the subset of rows with PDB entries, select any row of interest. Then right-click on it and choose Send Selection to SeqMan Pro.

If prompted, select an individual sample of interest and press OK.

Part C: View variants in SeqMan Pro or SeqMan Ultra:

Instructions below pertain to SeqMan Pro, but are similar in SeqMan Ultra.

If you are coming from Part B, above, the sample is automatically selected and its Alignment view is opened. Continue directly to Step 2. If you are coming from Part A, launch SeqMan Pro and use File > Open to open the .assembly file.

Choose Variant > Variant Report. (If desired variants are being filtered out, you may need to click on the Show All button.)

Click on the PDB ID column header to sort items with PDB IDs to the top. Select one or more rows, then right-click within the selection and choose Show Variant in Protean 3D (or use Variant > Show Variant in Protean 3D).

Part D: View the protein structure in Protean 3D:

After finishing Part C, the protein structure with the variant of interest opens in Protean 3D.

In the Molecules area, two near-identical copies of the structure appear. The upper structure is the original structure from the Protein Data Bank. The lower structure is the variant version calculated by Protean 3D.

Use the Structure view to observe the mutated side chain along the backbone. To show/hide the different versions of each chain, check/uncheck boxes in the Molecules area.

To see notes about the structure chosen as the best match, look in the Experimental notes box at the bottom of the Variant view. If the Variant view is not visible, click on the Variant tab at the bottom of the Protean 3D window.

Protean 3D uses several metrics to determine the “best” PDB file to display when a variant is located in a CDS that is associated with more than one PDB file. Quality is the first consideration, with high-resolution crystal structures > NMR structures > low-resolution crystal structures > other techniques. This ordering is refined by alignment to the corresponding Uniprot sequences. This refinement considers the percent match and the number of gaps before the variant position. If two structures are still tied as the “best,” the largest structure is chosen.

To predict whether the mutation is stabilizing or destabilizing to the protein structure, use the table on the right of the Variants view. The delta-E (DFIRE-A) column displays the change in energy value based on the DFIRE calculation (reference). This number can be used to predict whether the mutation is stabilizing or destabilizing to the protein structure. A positive number is considered destabilizing to the structure when compared to the original amino acid; a negative number is considered non-destabilizing.

To further explore potential impact of mutation on protein stability and function, apply a solvent-accessible surface.

Analyze secondary structure characteristics to interrogate the mutation’s effect on protein flexibility, amphiphilicity, charge density, hydropathy, and more.

De novo genome assembling and editing workflows

Remove PhiX control reads from Illumina data prior to import

Need more help with this?
Contact DNASTAR