Whole Genome Sequencing

WGS Protocols  - Key tools & collaboratorsFAQs

Advances in massively parallel sequencing technologies have reduced dramatically the cost to undertake whole genome sequencing of bacteria.

PulseNet participants are currently running a number of pilot projects to implement whole genome sequencing as a routine tools for use in foodborne outbreak investigations and surveillance. We are looking to bring together the best tools and approaches from around the world, and implement them within the PulseNet network.

WGS Protocols

The following protocols have been developed by PulseNet USA, and are made available to the International community.  

Illumina MiSeq [PDF, 636 KB]

 PNL32_MiSeq Nextera XT Protocol [PDF, 636 KB]

Updated January 2016

FastQC [PDF, 260 KB]

  PNQ07_Illumina MiSeq Data QC [PDF, 260 KB]

Updated May 2015


  PND18_NCBI Biosample Submission [PDF, 37 KB]

Updated May 2015

^ top

 Key tools & collaborators

We are looking to bring together the best tools and approaches from around the world, and implement them within the PulseNet network. Key collaborators or tools we are exploring include:

Frequently Asked Questions

What is Whole Genome Sequencing (WGS)?

  1. WGS is the output and the process of generating the full DNA sequence of the genome of a microorganism. For foodborne bacteria, the genome includes the chromosome and any extrachromosomal genomic material such as plasmids. The actual process is also called next generation sequencing (NGS) and is performed by sequencing the DNA in multiple (10- >100 x) small random fragments (‘reads’) that typically vary in size between less than 100 to several 1000 DNA basepairs (bp) (‘massive parallel sequencing’). The average number of times the genome is sequenced is called the coverage. Before the data can be analyzed, it must be cleaned and assessed for quality and often assembled into as few contiguous pieces (contigs) as possible. A completely assembled genome is in one contig for the chromosome and the extrachromosomal elements in each one piece but most often a genome will be assembled in 5- 200 contigs. If a genome is not fully assembled, we do not know the actual sequence of the whole genome but rather 97- 99 % of it. Assembling genomes is a computer intensive process that can be done by aligning the raw sequences against  a well assembled sequence of a closely related strain (reference based assembly) or simply by aligning overlapping sequences from different reads without the need of a reference genome (de novo assembly). However, some comparisons of genomes may be performed little assembly (‘assembly free’) with minimal processing. For example, if you want to check if a specific gene, e.g, rpoB for species identification, or a specific set of genes, e.g., those used for multi locus sequence typing (MLST), for which the sequence(s) are known, the raw reads of the strain in question may be queried without assembly for the presence of this gene or these genes.


Last modified: