Did you arrive here by selecting the DNASTAR Navigator workflow Transcriptomics > De novo transcriptome assembly and annotation? If so, you’re in the right place!
The RNA-Seq de novo transcriptome workflow is also called the “transcript annotation” (or “StarBlast”) workflow. To specify this path, chose Transcriptome/RNA-Seq in the Choose Assembly Workflow screen, and De novo assembly in the Choose Assembly Type screen.
In the past, de novo assembly of RNA-Seq data could result in thousands of contigs representing the expressed transcripts, without any context or labels. For Lasergene 13.0 and later, SeqMan NGen automatically attempts to group contigs from the same gene, and then name and annotate them based on the best match to a collection of annotated reference sequences. Two different SeqMan NGen assembly engines are used to optimize your results. Note that results from this workflow are non-quantitative.
Result files for this workflow are described in detail in RNA-Seq de novo transcriptome workflow output.
The following brief video shows the this workflow in action:
The de novo assembly and annotation of the RNA-Seq data occurs in a series of steps performed automatically by SeqMan NGen.
- Perform read clustering with the reference-guided assembler. Any RNA-Seq data can be used, though the expected data for this workflow is Illumina data, preferably with reads ≥ 100 bases in length.
- Perform de novo assembly of each cluster with the de novo assembler.
- Compare contig consensus sequences to the specified set of reference sequences (the “database”). Licensed users can use the SeqMan NGen wizard to access DNASTAR’s database of transcript annotations extracted from data on NCBI’s RefSeq website. In addition to the complete collection, subsets of the data are available:
Alternatively, you may create a custom database for use in this step, as described in Creating a custom transcript annotation database.
- Identify and merge contigs belonging to the same gene.
- Perform a second de novo assembly for the grouped contig sequences using the de novo assembler. The goal is to produce the most complete assembly possible for each transcript in the data set.
- Compare the updated contig sequences to the same database as in Step 3. The best matching database entry for each contig is used to label that contig at the gene level and provide summary statistics on the match.
See Use RNA-Seq de novo transcriptome output as a reference to learn how to use the output of an RNA-Seq de novo transcriptome assembly as input for the RNA-Seq reference-guided workflow.
Need more help with this?