Using Lasergene to De Novo Assemble PacBio HiFi Data
Long-read sequencing provides many advantages to earlier sequencing types, though it has some known disadvantages as well. PacBio HiFi sequencing produces long and accurate (>99.9%) sequence reads that are especially useful for de novo genome assembly.
Lasergene 17.3 supports two workflows for de novo assembly of PacBio HiFi, PacBio CLR or Oxford Nanopore reads: De novo assembly and De novo assembly and polishing. The former only requires data from a single sequencing platform and is the one we will be discussing in this post. In this workflow, reads are assembled in SeqMan NGen, while SeqMan Ultra is used to view, analyze and edit the finished assemblies.
How could I benefit from this workflow?
If you work with novel genomes, the de novo PacBio long read sequencing workflow could be the perfect solution for your sequence assembly needs. SeqMan NGen assembles PacBio HiFi reads into large and accurate contigs and creates genomic scaffolds, often representing the entire chromosome.
Even with the best data for de novo genome assembly, however, some genomic regions can be difficult to assemble or some long homopolymeric regions may not be completely resolved. That’s why Lasergene provides additional finishing and editing tools that allow you to improve scaffolds in your draft genomes and create a high-quality finished genome.
What are PacBio HiFi reads, and where do they come from?
PacBio HiFi reads are produced via PacBio’s Sequel HiFi sequencing platform using a hybrid short and long read sequencer mode called “circular consensus sequencing” (CSS). PacBio HiFi reads are shorter but more accurate (see this PacBio post for images) than long-reads produced using other sequencing technologies.
Small sequencing facilities do not usually have access to the PacBio Sequel HiFi sequencing platform or its predecessor, the PacBio RSII sequencer. To obtain reads in PacBio HiFi format, you will likely need to send your samples to a larger facility that supports this new technology.
What are the steps for de novo assembly in Lasergene?
The steps for SeqMan NGen’s basic de novo long-read workflow are:
Launch SeqMan NGen. In the Welcome screen’s default tab (De Novo Assembly and Finishing), choose the PacBio/Nanopore workflow De novo assembly. This is the most commonly-used of SeqMan NGen’s four de novo long-read workflows. (To learn about the other workflows and when to choose them, see our blog post How to Assemble Genomes like a Bioinformatics Pro.)
In the Input Sequences screen, select PacBio HiFi from the dropdown menu and use an Add button to upload the long read sequences.
Click Next, then follow the remaining wizard prompts to finish setting up the assembly. For more detailed information, refer to the SeqMan NGen User Guide.
In the Run Assembly Project screen, SeqMan NGen will recommend whether to run the assembly locally or on the cloud, though you can choose either option. Press the desired Run button to start the assembly, then wait for the assembly to finish.
Press SeqMan NGen’s Open Assembly button to launch SeqMan Ultra with the assembly open and ready for analysis.
Analyze the assembly using SeqMan Ultra’s customizable graphical views, reports and tables, which can be exported in a variety of formats.
SeqMan Ultra also lets you edit the assembly by splitting contigs, scaffolding them, correcting misjoined contigs and realigning the contigs within SeqMan Ultra itself.
Why use Lasergene versus other available software packages for this workflow?
Lasergene provides the easiest, fastest and most accurate way to de novo assemble your PacBio HiFi or other long-read data.
- Easy project setup with intuitive graphic interfaces. Yellow highlighting in SeqMan NGen shows you exactly where your input is needed.
- Fast assembly times. We are in the process of running benchmark tests using SeqMan NGen’s basic long read de novo assembly workflow. Preliminary results show that bacteria and yeast, for instance, assemble 6-30 times faster when using PacBio HiFi data compared to Illumina data.
- Accurate assemblies. Bacterial genomes often assemble into a single contig.
- An integrated pipeline with editing capability. After assembly, simply press a button to open the results in SeqMan Ultra or ArrayStar for downstream analysis and/or editing.
Would you like to try this workflow with your own data?
Click the link below to get a 14-day fully functional trial of Lasergene.
Want to master this workflow in record time? Consider a free, no-obligation online demo with one of our Wisconsin-based staff scientists. Unlike a written or video tutorial, a live demo can be customized to your specific research and data type. Don’t be shy—join hundreds of your colleagues who schedule demos with us each year! Our only goal is to make sure you are successful using the software and getting great results. If you’re happy, we’re happy!