De novo Transcriptome Assembly: SeqMan NGen vs. CLC Genomics Workbench
De novo transcriptome assembly is a popular technique frequently used to study non-model organisms.
Do you need to assemble RNA-Seq de novo transcriptome data as part of your research? If so, you might want to consider DNASTAR’s SeqMan NGen, part of the Lasergene Genomics package. SeqMan NGen provides an integrated solution that’s easy to use: from setup to downstream analysis.
What is involved in setting up the assembly in SeqMan NGen?
SeqMan NGen provides a wizard for setting up the assembly. The first step is to launch SeqMan NGen and choose the “De novo transcriptome” workflow.
Next, specify any contaminant, vector and/or adapter sequences you wish to remove.
You can then add de novo transcriptome reads from any major NGS technology, including Illumina, Ion Torrent, Roche 454, and Pacific Biosciences. This SeqMan NGen workflow supports very large data sets and can assemble over 450M reads on a desktop computer.
On the wizard’s “Transcript Annotation Database” screen, specify if you’d like to automatically annotate transcripts. Licensed users to add a DNASTAR database of transcript annotations extracted from data on NCBI’s RefSeq website. Over a dozen organism groups are represented, leading to easy downstream analysis of assembled, annotated transcripts.
Once the project is set up, you can choose to assembly on your local computer or on the cloud.
How does SeqMan NGen compare to CLC Genomics Workbench in setup and downstream analysis?
We think SeqMan NGen’s de novo transcriptome workflow is a leader in its class. But don’t take our word for it. This 2018 study found that SeqMan NGen performed this workflow better than CLC Genomics Workbench in a variety of areas.
Below, we expand on some of the findings identified in the paper.
SeqMan NGen’s wizard provides flexibility in setting up the assembly
The study authors report that SeqMan NGen “…allows users to specify rRNA or other input contaminant sequences prior to assembly. This option is not currently available in the CLC GW de novo transcriptome workflow.”
In addition to letting you specify rRNA and other contaminant sequences, SeqMan NGen’s wizard also lets you remove specific vector or adapter sequences. Alternatively, you can elect to perform fully automated adapter removal by checking the “Remove universal adapter” option.
SeqMan NGen assembly output contains fewer and longer contigs
With other applications, de novo assembly of RNA-Seq data can potentially result in thousands of unlabeled contigs representing the expressed transcripts. Performing meaningful downstream analysis on this many unannotated contigs is nearly impossible.
Using its proprietary assembly algorithm, however, SeqMan NGen creates fewer and longer contigs than CLC Genomics Workbench. The study authors noted that“… the Lasergene SMN Trace Evidence consensus-calling algorithm generated longer contigs on average…Meanwhile, CLC GW had assembled over nine times the amount of contigs…”
How does SeqMan NGen do it? SeqMan NGen automatically attempts to group contigs from the same gene, and then name and annotate them based on the best match to a collection of annotated reference sequences (the “Transcript Annotation Database”) extracted from data on NCBI’s RefSeq website. The total count of transcript fragments that aligned and matched RefSeq sequences provides the sequencing coverage. Many data sets assembled with SeqMan NGen produce a large number of long transcripts that are likely full-length transcripts.
SeqMan NGen reports whether contaminant sequences were present
Software that lacks the ability to report excluded reads may be oversampling the reads, reducing the precision of the transcriptome assembly. By contrast, SeqMan NGen reports which reads were excluded. The comparison study found that SeqMan NGen “…clearly defines excluded reads in its project report…”
Lasergene Genomics supports many options for downstream analysis
After de novo transcriptome assembly, other applications in the Lasergene Genomics package allow different types of downstream analysis.
View reports in SeqMan Ultra
Want to know if you’re seeing something new? Open the finished assembly in SeqMan Ultra to view known and novel transcripts separately in two highly customizable and sortable reports.
According to the study authors, SeqMan NGen “produced both annotated and novel transcripts lists. The NCBI RefSeq database was used to obtain a number of known or homologous genes from the assembled transcript sequences.” By contrast, “The CLC GW assembly output contained a list of assembled transcripts and unassembled sequence reads.”
To see DNASTAR’s benchmarks comparing identified and novel transcripts assembled for different data sets, see this blog post.
By the way, if you’re curious why the average transcript length found by software is often shorter than the length of the organism’s mRNA, the blog post above also explains this phenomenon. The short answer is that read length makes a huge difference in de novo transcriptome assemblies. Illumina reads over 150bp in length typically produce much longer assembled transcripts–up to full length–while reads less than 150bp may produce transcripts as little as half the length of the mRNA.
View heat maps and gene ontology in ArrayStar
You can use ArrayStar to view transcriptome results as a heat map and to perform gene expression analysis on the transcripts.
You can also use ArrayStar to explore gene ontology. According to the paper’s authors, “Gene ontology (GO) analysis provides functional description of the genes and existing relationship or functional nodes among genes.” SeqMan NGen “has an integrated tool to perform GO analysis, but not CLC Genomics Workbench.”
Use de novo transcriptome results as a template for RNA-Seq assemblies
SeqMan NGen lets you use your identified or novel transcripts in FASTA format as templates for aligning other RNA-Seq data. Simply follow the templated RNA-Seq workflow and upload the FASTA sequence in the Reference Sequence screen.
Try SeqMan NGen yourself!
If you are currently considering commercial or open-source software, we suggest you download a free trial of any paid software programs you are considering. As the authors of the study point out (and many users will attest), it’s always a good idea to try commercial software before you buy to ensure you will obtain good results with your own data.
Click the button below to download and install a fully-functional free trial of Lasergene, including SeqMan NGen.