During de novo assembly, contamination of Illumina data with PhiX control sequence may result in the generation of spurious contigs. For background information, please see Mukherjee et al. Standards in Genomic Sciences 2015, 10:18.
- Not all Illumina data are contaminated with PhiX.
- PhiX contamination is not a major concern for reference-guided assemblies.
- In informal tests at DNASTAR, the amount of contamination in most data sets was so low that the spurious contigs were automatically discarded for being “under the minimum coverage” per the SeqMan NGen defaults. And when coverage was higher, contaminated contigs contained only PhiX174 reads and could be readily recognized post-assembly.
If you are following a de novo workflow, you can easily remove PhiX contamination prior to assembly by following these steps:
- Download the sequence NC_001422.1 (Enterobacteria phage phiX174 sensu lato, complete genome) from the NCBI website.
- Launch SeqMan NGen and proceed through the wizard screens.
- In the Choose Assembly Type|topic=Choose Assembly Type screen, choose a de novo assembly option.
- In the Read Options|topic=Read Options screen, check the Contaminant Scan box. Use the associated Add button to add the PhiX174 sequence downloaded in the previous step.
- Proceed through the rest of the SeqMan NGen wizard screens and assemble as usual.
Any PhiX174 sequence will be removed prior to assembly.
Need more help with this?