Home > Blog > Improving Genome Assemblies with PacBio HiFi Sequencing

Improving Genome Assemblies with PacBio HiFi Sequencing

By Matt Keyser, DNASTAR Senior Product Manager
April 14, 2025 | Lasergene Genomics

As the Senior Product Manager for Lasergene, Matt Keyser works with scientists, software developers and support staff at DNASTAR to create sequence analysis software that meets the current needs of researchers and that is ready to support future challenges and changing technology. In his 20 years (and counting) at DNASTAR, Matt has advised numerous customers on a wide array of sequencing and analysis projects, giving him a unique understanding of the challenges faced by scientists today.

We at DNASTAR sometimes receive questions about using PacBio HiFi sequencing for DNA assembly. I decided a blog post on this topic is in order. Part A is in Q&A format to make it easy to find the info you’re curious about. Part B is a step-by-step demo of how to set up and analyze a PacBio HiFi assembly in Lasergene.

Part A: PacBio HiFi Q&A

When sending your DNA for sequencing, why might you want to choose PacBio HiFi over other long-read or short-read sequencing technologies?

PacBio HiFi sequencing is a cutting-edge DNA sequencing technology that provides highly accurate long-read sequencing.

Traditional “short-read” sequencing methods produce relatively short DNA fragments. PacBio HiFi, on the other hand, generates much longer reads, often tens of thousands of base pairs in length. This is crucial for resolving complex genomic regions, such as repetitive sequences and structural variations.

HiFi sequencing reads can reach lengths of up to 30kb and boast a high accuracy rate (greater than 99.9%), comparable to Sanger sequencing, which is crucial for reliable variant calling and genome assembly even in regions of high homology and repetitiveness where short read sequencing technologies cannot be aligned accurately. Also, while short read technologies can be used to accurately detect SNPs and small indels, HiFi sequencing can accurately detect a wider range of variants including structural variants, phased variants and methylation patterns.

Similarly, while short read sequencing technology is widely used for RNA-seq gene expression analysis, resolution of full length mRNA transcripts is difficult whereas HiFi sequencing can be used to resolve full length mRNA transcripts and precisely categorize alternative splicing events.

What are the common pitfalls or challenges encountered when working with HiFi data, and how can they be avoided?

I’m not aware of any issues specific to HiFi data that are not also issues with short read data. Both data types do potentially generate large amounts of data requiring robust data storage and computing resources.

What are the best practices for library preparation and sequencing to maximize HiFi read quality and yield?

To maximize HiFi yield per SMRT Cell, PacBio recommends fragmenting the gDNA to a size distribution mode between 15 kb – 18 kb for human whole genome sequencing. Libraries with a size distribution mode larger than 20 kb are not recommended for HiFi sequencing.

What are the future directions and potential applications of HiFi technology?

One application that will expand in the future is clinical metagenomics. Currently, clinical diagnosis is dependent on culturing methods that may not detect presence of low abundance bacteria or those that are difficult to grow on medium. By contrast, HiFi metagenomic sequencing does not depend on the ability to grow bacteria on medium and can provide a cost-effective and comprehensive microbiological profile of a clinical sample.

How does HiFi data compare to other long-read technologies in terms of accuracy, cost, and throughput? When would you choose one over the other?

In general, HiFi sequencing is more accurate (>99.9%), but with shorter read lengths (up to 25Kb), and higher cost when compared to Nanopore sequencing. Nanopore reads lengths can exceed 1Mb and can be more cost effective with high throughput potential. HiFi is the best choice when accuracy is required, such as resolution of complex genomic regions or haplotype phasing, while Nanopore may be a better and more cost-effective option for large scale projects or for de novo assembly where the longest reads can span difficult-to-resolve repetitive regions.

Which SeqMan NGen workflows commonly use PacBio HiFi? For example, can I use it for transcriptome analysis or metagenomics, or is it only for whole genome/exome assembly?

SeqMan NGen supports both de novo and reference guided assembly and alignment of PacBio HiFi data. In Lasergene 18.0, de novo genome (microbial) assembly is supported as well as reference-guided genome and exome alignment.

When using HiFi reads in SeqMan NGen, can I phase heterozygous variants? If so, how? Where do I analyze the results?

Yes, SeqMan NGen provides a new and novel haplotype phasing algorithm that can detect phased variants. Analysis is done in GenVision Pro where phased regions (blocks) can be visualized along with the variants that they contain. Individual phased sequence reads can also be visualized with different colors used to identify the heterozygous alleles within a phased block region.

What are the computational requirements for assembling HiFi data in SeqMan NGen (e.g., memory, CPU, GPU, disk space)? Is there a cloud-based option?

The current version (v18.0) of SeqMan NGen uses a combination of CPU, free disk space and memory (RAM) to align PacBio HiFi data that varies with the size of the data set. For human genome sized data, the best performance is attained with 8+ core CPUs, 32GB RAM, and a dedicated 4TB hard drive to handle temporary files. The next update to SeqMan NGen (v18.1) will be able to utilize GPU processing that greatly improves assembly speed for HiFi data and eliminates the 4TB free disk space requirement. There are also cloud-based options that allow users to set up HiFi assemblies locally and then have the data automatically compressed, uploaded to the Cloud and assembled on Amazon cloud hardware (then automatically downloaded). This is a great option for users that want parallel processing of multiple data sets, or do not have adequate local computing resources sufficiently powerful for large assembly projects.

How do you manage and store the large datasets generated by HiFi sequencing?

You need a lot of space. My personal computer is a powerful i7 HP laptop with two hard drives (2TB and 4TB). I also use external 4TB hard drives to store additional large data sets.

Part B: Assembly setup and downstream analysis in Lasergene

PacBio HiFi data (and all other long-read types) can be used in both de novo assemblies and in variant analysis/resequencing assemblies. The following example shows how to set up and analyze a reference-guided assembly for a Drosophila melanogaster (fruit fly) data set.

Setting up and running the assembly in SeqMan NGen

1) Launch SeqMan NGen and click New Assembly.

SeqMan NGen wizard for PacBio Hifi sequencing workflow

2) In the Variant Analysis / Resequencing tab of the SeqMan NGen Workflow screen, choose the PacBio / Nanopore Whole genome

3) In the Reference Sequence screen, click Download Genome Package to select a curated template package from DNASTAR.

Select Drosophila melanogaster and press Select.

Then press Next to proceed to the next screen.

4) In the Input Sequences screens, choose PacBio HiFi as the Read technology. Load the PacBio HiFi sequence (here, a 50 MB .fastq file) using the Add button, or by dragging it from the File Explorer and dropping it onto the SeqMan NGen wizard screen as shown.

5) Press Next twice to reach the Analysis Options screen. The Detect SNPs and other small variants is checked by default. Since the organism is diploid, we will take the optional step of selecting the Diploid – Phased This causes SeqMan NGen to separate variants by allele during assembly, so we can view phased variants during analysis. We don’t know the fruit fly’s gender, so we select Unknown.

(As an aside, if we had been working with human samples and wanted to automatically add enhanced annotations to variants discovered during assembly, we would have chosen Human build 37 or 38 in Step 3. Then, in this step, we would check the box next to Annotate with the Variant Annotation Database, shown highlighted in yellow above.)

6) Click Next to proceed to the Assembly Output screen. There, choose a project name and assign the folder where you want to save the project.

7) Click Next to move to the Run Assembly Project screen. In this case, SeqMan NGen has reviewed our computer’s available memory and recommends running the assembly locally. Se we press the link “Run assembly on this computer” to initiate the assembly.

8) Once the assembly finishes, the phrase “XNG done” will appear at the bottom of the Assembly Log, and a Finish button will become active in the bottom right corner.

9) Press Next to go to the Assembly Summary screen.

10) To open the assembly in GenVision Pro, click the button Analyze and compare variants.

Analyzing variants in GenVision Pro

In Genome Pro, the Genome view on the left displays each chromosome as a row of phase blocks in alternating colors of blue and green, while the Experiments panel on the right lists the chromosomes and their lengths.

PacBio Hifi sequencing workflow results displayed in GenVision Pro

1) To see a chromosome in more detail within the Analysis view, double-click on its row in either location. When zoomed out, a top-level view shows features and phase blocks. In the image below, you can see where the green phase ends and the blue one begins. Vertical lines in contrasting colors indicate variants at those positions.

2) Press the CT or T tools in the top right to zoom in and view individual bases and variants. In the image below, there is a variant at position 18558809, which is within the Pde11 gene. The reference row shows a “T” at that position. In the Alignment track, the allele shown in green also has a “T,” but the allele shown in blue has a variant “A.” Below that, the variants track shows the same thing graphically. Note that the variants track is bifurcated and shows a variant ( short vertical bar) in one allele and not the other.

3) To create a table of variants, click the Show Variants Table tool in the Experiments panel.

Scores of customizable data columns provide statistics and other valuable information about each variant. Just a subset of these columns is shown in the variant table image below.

Conclusion

PacBio HiFi’s accuracy and long-read capabilities are transforming genomic research. Lasergene’s SeqMan NGen simplifies the assembly and analysis of this data, with intuitive setup screens, built-in variant calling, and features like haplotype phasing. For downstream analysis, GenVision Pro offers a variety of views and numerous customization options, including robust variant filtering.

Want to try this workflow with your own data?

Request a free trial today!

REQUEST A FREE TRIAL

Would you like to receive technical tips and special offers straight to your inbox?