How Accurate Is Your Variant Caller?

If you are using Next Generation Sequencing (NGS) for clinical research, cancer genomics, genome-wide association studies, or other genomic research, the ability to identify variants with confidence is of utmost importance. For these projects, you need software that can correctly identify the true variants, while minimizing false positives that could lead to wasted research effort.

 

Most variant callers provide probability scores for each detected variant. However, these values alone do not tell you how reliable the overall results are. Just because a difference from the reference sequence is statistically significant doesn’t necessarily mean that it is an accurate call.

 

How can you know if your results are accurate?

 

In most studies, especially when looking for rare mutations, having a reliable reference set with known variations isn’t feasible. To test the accuracy of NGS alignment and variant calling in Lasergene Genomics Suite, we used SeqMan NGen to align whole human exome data from the Genome in a Bottle Consortium (GIAB) to the human genome. Because this is a well curated data set, we were able to compare the variant calls to the “answer” provided by GIAB. We also performed alignment and variant calling in several other software packages using the same data and comparable settings. We then looked at three metrics:

 

  1. Sensitivity – This is also known as the true positive rate, and is the ratio of correctly identified variants to the total known variants in the reference set. The higher the sensitivity, the greater the likelihood that a variant in the sample will be identified by the software.
  2. Specificity – Also known as the true negative rate, this is the ratio of non-variant calls to the total number of positions in the reference set that are known to be homozygous with the reference sequence. Specificity is inversely related to the number of false positives.
  3. False Discovery Rate (FDR) – This is the ratio of false positives to all variant calls made by the software. The FDR value for a variant caller allows you to understand how many variants in your project are likely to be false positives.

Because an accurate alignment is a necessary precursor to accurate variant detection, these metrics also help you understand the alignment accuracy from various software pipelines.

 

Here’s a peek at the results we obtained:

 

 

Check out our new Accuracy page to see the full results of these comparisons across numerous exomes and other sample data sets and learn how Lasergene Genomics Suite stacks up against the competition!