• Software
    • DNASTAR LASERGENE
      Comprehensive Sequence Analysis
      • Lasergene Molecular Biology
      • Lasergene Genomics
      • Lasergene Protein
    • NOVA APPLICATIONS
      Protein Modeling
      • NovaFold AI
      • NovaFold
      • NovaFold Antibody
      • NovaDock
  • Workflows
    • Molecular Biology Workflows
      • Automated Virtual Cloning
      • Clone Sequence Verification
      • Gel Electrophoresis Simulation
      • Multiple Sequence Alignment
      • Pairwise Sequence Alignment
      • PCR Site-Directed Mutagenesis
      • PCR Primer Design
      • Phylogenetic Analysis
      • Plasmid Maps
      • Sanger Sequence Assembly
      • Sequence Editing and Annotation
  • Protein Analysis
    • Antibody Modeling
    • Antibody Phage Display
    • Epitope Prediction
    • Protein Docking
    • Protein Sequence Analysis
    • Protein Stability Prediction
    • Protein Structural Alignment
    • Protein Structure Analysis
    • Protein Structure Prediction
  • Genomics
    • Clinical Research
    • De Novo Genome Assembly
    • Mauve Genome Alignment
    • Metagenomic Assembly
    • Variant Analysis
    • Viral Genome Analysis
    • Whole Genome/Whole Exome
  • Transcriptomics
    • ChIP-Seq Data Analysis
    • De Novo Transcriptome Assembly
    • RNA-Seq Alignment
  • Services
    • Protein Services
    • Genomic Services
  • Pricing
  • Resources
    • Product Updates
    • Product Notifications
    • Blog
    • Educational Software Request
    • Documentation
    • Technical Requirements
      • File Formats
      • Licensing Options
  • Training
    • Help + Tutorials
    • Webinars
    • Technical Support Request
  • About
    • Careers
    • Distributors
    • Legal Information
    • Privacy Policy
  • Contact

QUESTIONS? CALL 866.511.5090

DOWNLOAD FREE TRIAL
SHOPPING CART
MY ACCOUNT
DNASTAR DNASTAR
  • Software
    • DNASTAR LASERGENE
      Comprehensive Sequence Analysis
      • Lasergene Molecular Biology
      • Lasergene Genomics
      • Lasergene Protein
    • NOVA APPLICATIONS
      Protein Modeling
      • NovaFold AI
      • NovaFold
      • NovaFold Antibody
      • NovaDock
  • Workflows
    • Molecular Biology
      • Automated Virtual Cloning
      • Clone Sequence Verification
      • Gel Electrophoresis Simulation
      • Multiple Sequence Alignment
      • Pairwise Sequence Alignment
      • PCR Site-Directed Mutagenesis
      • PCR Primer Design
      • Phylogenetic Analysis
      • Plasmid Maps
      • Sanger Sequence Assembly
      • Sequence Editing and Annotation
    • Protein Analysis
      • Antibody Modeling
      • Antibody Phage Display
      • Epitope Prediction
      • Protein Docking
      • Protein Sequence Analysis
      • Protein Stability Prediction
      • Protein Structural Alignment
      • Protein Structure Analysis
      • Protein Structure Prediction
    • Genomics
      • Clinical Research
      • De Novo Genome Assembly
      • Mauve Genome Alignment
      • Metagenomic Assembly
      • Variant Analysis
      • Viral Genome Analysis
      • Whole Exome/Genome Sequencing
    • Transcriptomics
      • ChIP-Seq Data Analysis
      • De Novo Transcriptome Assembly
      • RNA-Seq Alignment and Analysis
  • Services
    • Protein Services
    • Genomic Services
  • Pricing
  • Resources
    • Product Updates
    • Product Notifications
    • Blog
    • Educational Software Request
    • Documentation
    • Technical Requirements
      • File Formats
      • Licensing Options
  • Training
    • Help + Tutorials
    • Webinars
    • Technical Support Request
  • About
    • Careers
    • Distributors
    • Legal Information
    • Privacy Policy
  • Contact

Working with Variant Call Format Files in Lasergene Genomics

Working with Variant Call Format Files in Lasergene Genomics

March 11, 2020 DNASTAR News, Next-Gen Sequencing

How to use VCF files for variant analysis and genotyping

Annotated data is essential for assessing the importance of variants to a given trait or disease under investigation. This is true whether you are looking for known variants in a single sample or comparing across multiple samples for shared variants or affected genes. Variant call format (VCF) files provide a compact, human-readable method of storing variant information from one or multiple samples that share the same reference sequence. Unfortunately, they do not include important functional information about individual variants such as the impact on a gene. For example, VCF files do not indicate whether or not the SNP changes the amino acid sequence of a protein-encoding gene. This information is critical in assessing the potential impact of a sequence variant.

In order to provide basic functional annotation and enriched information for human samples, DNASTAR has developed a new workflow: Variant Annotation in Lasergene Genomics. This workflow allows users with human-based VCF files to annotate them using DNASTAR’s Variant Annotation Database (VAD).

 

What are Variant Call Format files, and where do they come from?

VCF files are produced by an assembly pipeline following alignment to the reference sequence by an aligner such as BWA, and variant calling by a tool such as GATK. Each new sample is sequenced and aligned against a reference genome, and variants are called based on that alignment.

VCF files often originate from the alignment of Illumina data, but some sources (e.g., Genome in a Bottle), combine data from multiple sequencing technologies to make higher confidence calls.

VCF files provide a convenient way to archive and share basic SNP and small indel information such as the variant base(s) and the chromosome and position where the variant is located.  These files are especially useful for large scale data involving a whole genome.

Multiple sample VCFs, such as those available from the 1000 Genomes project, have variant calls from multiple individuals. These samples were originally sequenced independently, then later combined into a single file. Since these are a coalesced version across multiple samples, they will also contain information where one or more samples do not have a variant at any given position. In other words, only one sample needs to have a variant at a position to be included in the file. This is the same idea as we use in ArrayStar to provide information on reference calls across a population of samples.

How does the variant call format annotation process work?

Users load their VCF file(s) into SeqMan NGen using the Variant Call Format (VCF) analysis workflow (see image to right). The software then annotates variants via a two-step process. The first annotation step classifies variants by their effect on coding regions, relative to the imported reference genome. The second annotation step includes import of the DNASTAR Variant Annotation Database (VAD), which combines data from a variety of SNP level annotation databases.

The image below shows the steps involved in the VCF annotation workflow.

SeqMan NGen's Workflow screen provides single- and multiple-sample options for the VCF Annotation workflow.
SeqMan NGen's Workflow screen provides single- and multiple-sample options for the VCF Annotation workflow.
Steps for annotating and analyzing multi-sample variant data in Lasergene. DNASTAR software tools and supplemental data are shown in orange.

After following the VCF Annotation workflow in SeqMan NGen, where is downstream analysis performed?

The workflow produces a faux assembly with annotated variants. These variants can be viewed in SeqMan Pro or SeqMan Ultra. However, they are more commonly viewed in ArrayStar, where all the standard variant comparison tools and views are available.

In ArrayStar users can filter for genes and/or variants of interest across multiple samples and use the many cross-comparison tools. The “Add/Manage Columns” tool lets users add useful data columns from the Variant Annotation Database to any of the ArrayStar tables (see image on right).

ArrayStar's "Add/Manage Columns" dialog lets you choose which data columns to display in a given table. Options include data and statistics from the Variant Annotation Database.
ArrayStar's "Add/Manage Columns" dialog lets you choose which data columns to display in a given table. Options include data and statistics from the Variant Annotation Database.

How does the Variant Annotation workflow differ from simply adding an uploaded VCF file to any reference-guided workflow?

In SeqMan NGen’s reference-guided workflows, uploaded VCF files are used to tag positions of interest in a new NGS assembly. These positions are assigned a user ID that can be used in filtering in SeqMan Pro, SeqMan Ultra, or ArrayStar.

By contrast, the VCF Annotation workflow annotates specified VCF files using the corresponding annotated reference sequence without the use of NGS read data.

It is important to note that the reference sequence used MUST be in the same coordinate system as was used to generate the VCF; otherwise the results will be erroneous. For example, a VCF generated with build 37 of the human genome must be compared to the annotated build 37 human reference, not the newer build 38 sequence.

What type of researcher could benefit from the VCF Annotation workflow?

This workflow is designed to help scientists researching variants that may be involved in a particular human disease or trait. For example, in a recent poster, we used this workflow to analyze 96 targeted resequencing samples from a Chinese cohort with lung squamous cell carcinomas (LSCC, Li et al., Sci Rep. 2015. 5:14237). We were able to easily identify unique mutations in numerous samples across the cohort which all lead to nonsense and frameshift mutations in the TP53 tumor suppressor gene.

Some researchers may use the workflow to update their own sample annotations with a newer version of the annotated reference genome. Others, including those who do not normally use SeqMan NGen can also benefit from this workflow. A common scenario is to collect VCF files from colleagues or public resources and annotate them for the purposes described above.

SEE THE POSTER
TRY LASERGENE FREE
0
Share

Leave a Reply

Your email is safe with us.
Cancel Reply

Search Blog Posts

Categories

  • Best Practices
  • Clinical Research
  • DNASTAR Customer Stories
  • DNASTAR News
  • Events
  • Long Read Sequencing
  • Molecular Biology
  • Newsletters
  • Next-Gen Sequencing
  • Press Releases
  • Product Notifications
  • Product Updates
  • Publications
  • Resources
  • Structural Biology
  • Webinars
  • Workflows

Recent Posts

  • Lasergene 17.3.3 Release Notes June 29, 2022
  • Streamlining Variant Identification and Analysis Webinar June 23, 2022
  • Variant Annotation with Lasergene Genomics: The easy way to discover, annotate and filter sequence variants June 10, 2022
  • Expert-Guided Protein Structure Prediction Webinar May 13, 2022
  • Lasergene 17.3.2 Release Notes May 9, 2022

Tags

assembling sequences cloud Cloud Assemblies customers De Novo Assembly DNASTAR Genomics Lasergene Metagenomics Metagenomic Sequencing NCBI GenBank newsletters next-gen NGS NGS Sequence Alignment NGS Sequence Asembly publications seqbuilder pro SeqMan NGen sequence assembly Webinar

Archives

Find us on

Most Commented Posts

  • Lasergene 15.3 Release Notes By Katie Maxfield on October 24, 2018 4
  • EditSeq, PrimerSelect and classic MegAlign retired with the release of Lasergene 16.0 By Sharon Yildiz on July 12, 2019 4
  • How much disk space do I need for my templated genome assembly? By DNA STAR on November 24, 2015 4

Would you like to receive technical tips and special offers straight to your inbox?

  • Pricing
  • Software
  • Workflows
  • Resources
  • Training
  • About

Get a 14-Day free trial of our complete Lasergene package. Try before you buy!

FREE TRIAL DOWNLOAD

© 2022 — DNASTAR Privacy Policy

Prev Next
This website uses cookies to improve user experience and understand our web usage. By continuing to use our website, you consent to our use of cookies. Accept
Privacy & Cookies Policy
Necessary
Always Enabled