Create and Open a Transcriptome Assembly

In SeqMan NGen, the de novo transcriptome/RNA-Seq workflow is known as the “transcript annotation workflow.” (See the online SeqMan NGen help for details). During the assembly stage of this workflow, transcript consensus sequences are annotated and named via a search of a transcript annotation database e.g., NCBI’s RefSeq database.

 

De novo transcriptome assembly output is saved to a package called [project name].Transcriptome. The package contains three sub-folders, as well as a text file with a high level summary of the results. The three subfolders are:

 

      Assemblies Composed of a series of subfolders named sub_0, sub_1, sub_2, … each of which contain editable SQD documents of the Identified Transcripts assemblies. These SQD documents contain one or more contigs that match the same database entry. The Assemblies folder also contains a separate SQD entitled [project name]_NovelTranscripts.sqd, composed of the assembled contigs that did not have matches to the database, as well as a [project name]_AllUnassembled.fastq file containing the unassembled reads from the project in .fastq format.

 

      ReportsContains a [project name].AllTranscripts.SearchResults, text file with summary information on both the Identified and Novel transcripts.

 

      Transcripts Contains multi-sequence fasta files for the Identified and Novel Transcript consensus sequences.

 

Assembled transcripts with a database match exceeding the specified thresholds are referred to as “Identified Transcripts,” and are labeled with information from the best matching database entry following the default convention: [gene name]_[accession]_co_[assembly ID]_[contig ID]. In cases where a gene name is not provided for a database entry, the name will be truncated to [accession]_co_[assembly ID]_[contig ID].

 

Assembled transcripts that either did not have a database match or had a preliminary match that then fell below thresholds upon further processing are referred to as “Novel Transcripts”. The former is labeled following the convention cl_[assembly ID]_[contig ID] and the latter as [gene name]_[accession]_co_[assembly ID]_[contig ID] to give a hint as to the possible identity of that sequence.

 

Once you have created a transcriptome assembly, the next step is to open its .Transcriptome package in SeqMan Pro. To do this:

 

      On Windows – Either drag & drop the file/package onto a SeqMan Pro window or use SeqMan Pro's File > Open command.

 

      On Macintosh – Use the same methods as for Windows or double-click on the package itself. The package is recognizable by its SeqMan Pro icon.

 

The two-tabbed All Transcripts window opens, displaying identified and novel transcripts in tabular format. See The Transcript Tables for information about the contents of these tables and tasks that can be done within each table.