Home » Services » Bioinformatics » Services » Sanger Sequence Assembly

Sanger Sequence Assembly

 

The AGRF routinely processes assembly projects of varying sizes from the primer walking of single through BAC and cosmid sequencing, to whole genome shotgun assembly and finishing. The AGRF will also process "skimming" assemblies of projects, as well as more directed primer walking strategies targeting regions of specific interest. 

The AGRF currently uses the UWash suite of software for project sequence assembly and finishing. Sequence data is quality scored using phred and assembled using phrap. Viewing and finishing the resultant assembly are performed using consed. Initial finishing reactions are selected automatically using autofinish. As the project nears completion, appropriate reactions are selected by experienced finishers.

Pregap4 and Gap4 are the preferred processing and assembly tools for primer walking based projects.

8-x coverage is considered to be the optimal requirement for the efficient processing of whole genome shotgun projects, details concerning the amount of sequence coverage required can be found here. 

Projects assembled by AGRF are considered "finished" when:

 

Semi-automated sequence annotation may also be performed on the finished assembly, please contact us with your enquiry.

 

Contig Assembly

The contig assembly process at AGRF is shown below.  Using a combination of the Staden and UWash programs to assemble the individual sequences are assembled into contiguous sequences.  In brief this process involves the following:


There is a choice of two assembly programs Gap4 and Phrap. The latter is especially useful when Phred quality scores are available. The Gap4 editor is used to view the contigs, bring up the traces for editing, design primers to fill gaps, and export the consensus sequences.  

 

 

Lander-Waterman Calculations

A simple formula can be used to estimate what percent of a clone will be sequenced for a certain level of random sequencing (Lander & Waterman 1988).  The table below shows several examples.  For example, if you obtain 100 kb of random sequence from a 100 kb clone (1 x coverage) then you would expect to have 63% of  that clone sequenced.  Some regions of the clone will have been sequenced several times and other regions will not have been sequenced at all.  Click here for more details and references.

 

Fold coverage

Percent of clone sequenced

0.25 x

22%

0.50 x

39%

0.75 x

53%

1 x

63%

2 x

88%

3 x

95%

4 x

98%

5 x

99.4%

6 x

99.75%

7 x

99.91%

8 x

99.97%

9 x

99.99%

10 x

99.995%

 

These figures should be used as a guide only and there are many reasons why actual results may deviate from them, perhaps the most significant being non-random shotgun cloning.

Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics, 2, 231-239.

 

Recent News