Following are some commonly seen problems associated with DNA sequencing. The two most common causes for failure to get good or any sequence data for your samples are concentration and purity of your template DNA. If you are having trouble getting good sequencing results for your samples, you may first want to look through our How to Prepare Samples section for some recommendations on template preparation and quantitation. If it appears that you have done everything correctly and followed our suggestions, then look below for some additional reasons why you might obtain less than optimal DNA sequence data quality. We've presented both chromatograms and raw data for each problem. Raw data is the machine data that have not been analyzed by the computer and are not normally viewable to you. To view what high quality data looks like click here.
Problems and Solutions
Pic 1-1: Raw data of a failed reaction. Note the big spike which is also shown in Pic 5. It's cause by unincorporated florescent dye.
Pic 1-2: Chromatogram of the beginning section of Pic 3. Note that no bases are called.
Pic 1-3: Chromatogram of a failed reaction shown in Pic 2. Note the big dye blog which is also shown in the raw data of Pic 2. The peaks seen in this picture and Pic 4 are background noise pulled up artificially by analysis software.
Possible Cause 1: Priming site not present
Solution: It most frequently happens with vector primers. If you've chosen on of our vector primers, make sure it is present in your vector. Double-check your plasmid maps/sequences.
If you've designed your own custom primer from previous sequence data, make sure you were using a reliable area of sequence - look for sharp, well-defined peaks with no ambiguity. Avoid areas where the peaks are broader and not well separated - this will occur towards the end of the sequence where the fragments are larger and the polymer cannot adequately resolve single nucleotides, causing inaccurate basecalling.
Possible Cause 2: Not enough or no DNA/Primer added
Solution: Double check your quantitations, stock concentrations, and dilutions. Check our FAQs for the amount of DNA we need. While our sequencers are very sensitive and can detect a range of DNA concentrations, there is still a "threshold" amount that must be reached to obtain any sequence data
Occasionally, either the primer or template is accidentally left out of a reaction. We try to immediately identify this type of mistake and repeat such reactions.
Possible Cause 3: Inhibitory contaminant
Solution: The cycle sequencing reaction used to amplify samples for automated sequencing is very sensitive to the presence of certain contaminants, some of which will completely inhibit our sequencing enzyme. Salts, EDTA, alcohol, protein, RNA, detergents, cesium and phenol are some of the most commonly seen contaminants that have a negative on sequencing enzyme. You may need to reprep your sample to sufficiently remove one or more inhibitory components to obtain any sequence data.
The presence of multiple peaks within your sequence can be caused by numerous factors. To help determine the cause, look at two aspects: where the multiple peaks begin, and the overall signal strength of your sample. Samples with low signal strength can have artificially high background noise that can give the appearance of multiple peaks. However, if your average signal intensity is high, then you can rule out the possibility that background noise is the cause. We've broken down this section into two parts, based on where your multiple peaks begin.
Observation: Multiple Overlapping Peaks from Beginning
Pic 2-1: Overlapping peaks from beginning.
Possible Cause 1: Multiple priming sites involving vectors. Your primer may have a secondary priming site on the plasmid that may be identical or closely related, with different nucleotide sequences following each site, giving superimposed bands within your sequence. if the priming sites are identical (e.g. When more than one T7 promoter site is present), the double peaks will be strong from the outset. The fragments may also show shifted migration so that the double peaks are not directly on top of one another but will be offset to one side or the other due to the differing mobility patterns of the strands with dissimilar nucleotide composition. On other instances, a secondary priming site may not be exactly the same, but may differ by a few internal bases. In this case, the mismatched primer may not hybridize as efficiently but can still anneal and extend, and give rise to less intense fragments that can be seen underneath your peaks of interest.
Solution: In both cases, it's necessary to screen both your vector and insert carefully to look for sequences that may match or be similar to your proposed primer. You may need to choose another vector primer on the same end of the multiple cloning site or redesign your custom primer. when choosing another primer is difficult, such as when primer walking through a repetitive area, try to find a primer that has a 3'-base match specific to your area of interest which can help as an "anchor".
Possible Cause 2: Multiple priming sites in generating PCR products.
Solution: This may occur when one or both of the PCR primers hybridizes to more than one position on the template DNA, giving rise to multiple PCR products. Often this will be obvious when visualizing the PCR products on an agarose gel as there will be more than one band present. In this case, gel purification of the desired product will be necessary. One can run into difficulty, however, when the products are very similar in size, which may arise when amplifying related or repetitive DNA, and do not separate well on the gel. In this case, optimization of the PCR reaction may be necessary or redesign of the PCR primers in order to choose a more specific priming site.
Possible Cause 3: PCR primers acting as both forward and reverse primers.
Solution: Sometimes, a PCR product may be generated when one primer functions as both the forward and reverse primer in the PCR reaction, giving rise to an artificial product. This is fairly easy to detect when sequencing the PCR product as one primer will give double peaks from the start, while the other fails to give any sequence data. Redesign your set of PCR primers.
Possible Cause 4: Residual PCR primers and/or dNTPs
Solution: As two primers are present in the PCR reaction, incomplete removal of these primers can lead to double peaks within the sequencing data. Both primers will act as sequencing primers and lead to superimposed bands which correspond to the complementary strands from opposite orientations. It is critical to remove excess primers and dNTPs from the PCR reaction by purification. If attempting to do direct sequencing of PCR products without purification by diluting an aliquot of your PCR product with water to lower the concentration of residual primers and dNTPS (a method which we do not recommend), then it is imperative to optimize your PCR reaction so that primers and dNTPS are used in limiting amounts so that most are used up by the end of the PCR.
Possible Cause 5: Primers with high Tm
Solution: Primers that have a Tm much higher (>65ºC) than our suggested 57ºC-60ºC often do not function well as sequencing primers. When primers have a Tm that high, it is often a result of increased G-C content or because the primer is quite long, both factors that can increase the potential for primer secondary structure formation. In this case, redesign your primer with a lower Tm. The Tm of a primer is defined as the temperature at which 50% of the oligonucleotide and its perfect complement are in duplex. The Tm of an oligo can be roughly calculated by using the following formula:
Tm = 2°C( A+T ) + 4°C( G+C )
Possible Cause: Mixed plasmid prep
Solution: A plasmid prep that is contaminated by more than one product, such as two vectors with different inserts or vector with insert and vector without, will generally show an early section of clean sequence data (common vector multiple cloning site sequence) followed by double peaks. Occasionally, a plasmid may contain more than one vector molecule or may encounter spontaneous deletions or insertions during growth. The point at which the double peaks begin corresponds to the start of the insert cloning site. To avoid this problem, it's important to carefully pick a single colony from your growth plate, restreaking if necessary, to be sure that your colony is completely clonal. You should follow this up with a restriction digest of your plasmid run out on an agarose gel to ensure vector and insert are present as expected.
Observation: Sequence shows 'overlapped peaks' throughout, with the second (generally smaller) peak being the same base as that of the true base immediately to the right of it.
Possible Cause: Poor purification of primers in the synthesis process which leaves a certain percentage of n-1mers in the final product. When the DNA template is sequenced, this percentage of n-1mers will prime the DNA template, causing some of the sequence to be 1 bp shorter than it should be. Primers that have begun to degrade will also do so from the 3' end, causing a proportion of the original sequencing primer to become n-1.
Solution: Whatever the cause of the n-1s, it will be necessary to resynthesize the primer to obtain an oligo of suitable quality for sequencing. When high-quality reagents and proper protocols are utilized during oligo synthesis, cartridge or HPLC purification of the primers is usually not necessary for typical oligos (<30pb), but sometimes additional purification can be beneficial.
Observation: "Noisy" data can be identified by the presence of multiple peaks and numerous "N"s within your sequence. The Sequencing Analysis program assigns an "N' as a base identification when there are two or more peaks present at one position. This "N" may signify the legitimate occurrence of two nucleotides, as in the case of a heterozygote, but may also be seen when background noise is high or when multiple products are present. When your sample exhibits weak signal, the software attempts to compensate by boosting up the signal of sample bands to detectable levels. However, the background noise will also be artificially amplified, giving a poor signal-to-noise ratio. Background noise appears as many smaller, undefined peaks under your sequence peaks of interest. This noise is always present, but with well-prepared samples of good signal strength, it will be undetectable.
Possible Cause 1: Not enough DNA
Solution: Double-check your quantitations, stock concentrations, calculations and dilutions. Please note that we normally need 250ng of DNA for each reaction and the DNA concentration has to be above 30ng/ul.
Possible Cause 2: Inhibitory contaminant e.g. salts, phenol
Solution: The cycle sequencing reaction used to amplify samples for automated sequencing is very sensitive to the presence of certain contaminants, some of which can partially or completely inhibit our sequencing enzyme. You may need to re-purify your sample to sufficiently remove one or more inhibitory components to obtain better sequence data.
Possible Cause 3: Degraded DNA from nucleases, repeated freeze-thaw, excessive UV light exposure, bisulfite treatment.
Solution: Nuclease contamination in a template preparation as well as repeated freeze-thaw cycles can degrade DNA over time. Even low amounts of nucleases can extensively degrade DNA depending on storage conditions and temperatures, as well as the length of time the DNA is stored. Generally, re-isolation and purification of the template DNA will be necessary to obtain good DNA sequence. When extracting PCR products from a gel, prolonged exposure to UV light will degrade and nick the DNA. Limit the time and UV intensity as much as possible to prevent degradation. When treating DNA with bisulfite for methylation experiments, it is important to avoid long incubations at higher temperatures as substantial amounts of DNA will be degraded in this process.
Possible Cause 4: Inefficient primer binding (low Tm, degenerate primers, mismatch)
Solution: The Tm of a primer is defined as the temperature at which 50% of the oligonucleotide and its perfect complement are in duplex. The Tm of an oligo can be roughly calculated by using the formula:
Tm = 2°C( A+T ) + 4°C( G+C ) In our cycle sequencing reaction, our primer/template annealing step occurs at 55°C. Thus, if your primer Tm is much lower than 55°C, hybridization to its complementary template will be much less efficient and a lesser number of extending fragments will be generated. Increase your primer Tm by adding additional bases to the 5' or 3' end to raise the Tm to be within the range of 57°C-60°C. Degenerate primers and those with mismatched bases will also show decreased hybridization efficiency due to reduction of the stability of primer binding, and if degeneracy or mismatches occur at or near the 3' end of your primer, it is highly likely that your sequencing attempt will fail.
Pic 6-1: This is the raw data for the sequence with secondary structure. As illustrated, though the signal intensity dropped dramatically, the sequence managed to get through. Therefore, no such drop in intensity is observed in the chromatogram shown in the next picture.
Pic 6-2: This is the chromatogram of the above picture. No drop in signal intensity is observed since the analysis program artificially pulled up the peaks to make them look evenly distributed.
Pic 6-3: This is the raw data for the sequence with severe secondary structure. The sequence stopped almost completely at the point where the secondary structure starts.
Pic 6-4: Chromatogram of Pic 6-3. As indicated by the arrow, the data quality starts to degrade immediately at where the secondary structure starts. The peaks you see are residual signal mixed with background noise pulled up artificially by the data analysis program.
Possible Cause 1: Secondary structure
Solution: G-C rich, and to a lesser degree, A-T rich, DNA is predisposed to secondary structure formation, as strong hydrogen bonding between G and C nucleotides can cause the template DNA to loop or bend and anneal to complementary sequences, forming hairpins that can restrict the passage of the sequencing polymerase and thus be very difficult to sequence through reliably. These hairpins may not melt at our cycle sequencing temperatures and can cause premature termination of sequence data. Secondary structure may appear as a sharp termination of signal with no sequence data after, or if the loop has been relaxed slightly, you may see strong signal that drops abruptly but may have some weaker peaks following that are still quite accurate. With the newest formulation of BigDye Terminator chemistries (v3.1), some G-C rich difficulties have improved dramatically, but unfortunately it hasn't solved everything. There is not one solution that resolves every secondary structure problem. The first thing we usually try is to add a DNA denaturant such as DMSO to our sequencing reaction to help melt the duplex formation and allow the polymerase to pass through. Changing our cycle sequencing parameters to include a higher denaturation temperature (98°C vs. 96°C) is sometimes useful. Placing a primer as close to the hairpin loop as possible to help force its unwinding has also worked in the past. Sequencing the opposite strand can sometimes lead to a huge improvement. If these solutions don't work, we may suggest you try linearizing your DNA with restriction enzymes to help relax the hairpin. And if you are trying to PCR up a very G-C rich region, addition of betaine or DMSO to your PCR reaction can help, as can substitution of 7-deaza dGTP for 75% of the dGTP in your PCR reaction. And if all else fails, you can try manual radioactive sequencing as a last resort.
We developed a special protocol to overcome the secondary structure:
(Please feel free to contact us if you would like to get more information about our special protocol)
Pic 6-5: Above is an example of the results that we did not use our special protocol for the secondary structure. As you can see, the usable data stops at the red line.
Pic 6-6: Above is an example of the results that we used our special protocol. You can see that the data still continues to be clear and usable following the red line. (Note: Same DNA sample and primer are used in both figures above.)
Possible Cause 2: Linearized DNA
Solution: If your DNA has been cut with one or more restriction enzymes, the sequence data will sharply end at the recognition site of the enzyme that cut at the 3' end of your insert. Did you accidentally send us digested DNA? Run it out on a gel to see.
Observation: Repetitive stretches of DNA sequence (usually mono- or dinucleotide repeats) that show strong sequencing signals at the beginning of the repeat and then gradually taper off to an unreadable signal.
Cause: There are several possible reasons for this problem: (1) Slippage of the DNA polymerase on the template strand during elongation (2) Formation of secondary structure due to the repeat. (3) Depletion of dNTPs. The nucleotide composition, as well as the size, of a repetitive region can play a large role in the success of sequencing through such an area. In general, G-C and G-T repeats tend to be the most troublesome though the newest version of Applied Biosystems BigDye Terminator v3.1 contains some modifications that have allowed for some striking improvements in certain previously difficult templates. However, there are still some that remain a pain. In general, one can sequence partially through the repetitive region and the signal begins to fade and eventually becomes unreadable.
Solution: Various methods can be tried to sequence the repeat entirely, and many are similar to those we would use for G-C rich templates that form secondary structures, including the addition of DMSO. If the repeat region is not excessively large, sequencing from the opposite strand to complete the region can be successful, especially if the complementary strand has a nucleotide composition that is more efficiently extended. However, if the region is large, it may be difficult to complete its entire sequence and determine the exact number of repeats present. Alternative methods, such as directed deletions or the use of an in vitro transposon system may need to be utilized.
Observation: In general, dye blobs appear as broad, undefined peaks of a single color with the true DNA peaks underneath and tend to occur relatively early in the data - generally before 50-60 bp - so for many, they aren't much of a problem as that is still vector sequence.
Possible Cause: Dye blobs are unincorporated dye terminator molecules that have passed through the cleanup columns and remain in solution with the purified DNA loaded onto the sequencers.
Solution: They are most often seen with samples that have low signal strength. Samples with weak signal usually either 1) did not have enough DNA so there was less starting template to amplify and label, thus leaving behind a greater proportion of unincorporated dye molecules or 2) contained contaminants that inhibited the sequencing reaction and it's theorized that certain contaminants may have a predisposition to bind to these dye clumps. And we have noticed a pattern where certain customer samples, as a whole, are more likely to contain dye blobs regardless of signal strength. Repetition of samples with dye blobs is generally not too successful, as they don't often go away but sometimes do become less intense. With very weak samples, oftentimes there's not much we can do to fix the data. With samples of average signal strength, however, they are usually easily correctable as the true peaks are often visible beneath.
Pic 9-1 Raw data of a sequence with spike.
Pic 9-2 Chromatogram of Pic 9-1.
Observation: Spikes are seen as multicolored, condensed peaks within the sequence that usually obscure just one or two nucleotides worth of data.
Possible Cause: They are caused by tiny air bubbles within the liquid polymer or by small pieces of dried polymer that have flaked off and entered a capillary. Again, there seems to be a slight predisposition for some customer samples to experience these artifacts and, when they do occur, are much more pronounced in samples with weak signal. When a sample has strong signal, they are often not detectable, but there are times when they can be very visible.
Solution: The good thing about the spikes is that they are most often always correctable upon rerunning. So, please let us know if you want a repeat because of a spike - for those of you only interested in a small separate region that is not affected by something like this, there would be no need for rerunning, but for those who are looking at an entire reading frame, for example, we realize that this would be a problem. So, as we can't know everybody's experiments and regions of interest, we ask that you help us and let us know when this problem affects your analyses and we will quickly repeat it for you.
Observation: The sequence is characterized by bands that gradually spread out and therefore become irresolvable. The first peak that is too broad can be from base 1 to as late as base 400.
Possible Cause: The problem can be caused by 1) a bubble or blockage in the capillary, 2) excess salts, but for this facility it is most often caused by 3) some small, anionic contaminant from a "kit purified" (e.g. Qiaprep or Wizard) plasmid.
Solution: The solution for the first cause is to rerun the reaction, whereas for the second and third cause the reaction can be 1) diluted and rerun, 2) repeated with less template, or 3) repeated after dialyzing the template.
Pic 11-1: Data quality degradation after a polyT region.
Pic 11-2: Data quality degradation after a polyG region.
Observation: The above figure shows a drop-off in sequencing quality after a polyT region (Pic 11-1) and a polyG region (Pic 11-2). Sequence data up to and including the polynucleotide region may be fine, but the last base of the poly region and all peaks following it may show a wave-like, stuttering pattern of double peaks that cannot be interpreted. This tends to be more problematic in PCR products, but can also occur when sequencing plasmids, especially when trying to sequence the polyA region of cDNA.
Possible Cause: This difficulty is thought to arise due to enzyme "slippage" when the growing strand does not stay paired correctly with the template DNA during polymerization through the homopolymer region, thus giving rise to fragments of varying lengths that have the same sequence after this area.
Solution: When sequencing cloned DNA with a homopolymer region, several options can be tried. Sequencing the opposite strand can sometimes be more successful, especially when going through a polyG region as the polyC strand is often easier to get through. An oligo dT(15-20T) primer that contains a wobble base (A, G or C) on the 3' end can be used to anchor the primer in place at the end of the polyA region and give clean sequence following. This primer ( T20V) is provided by us to you free of charge. It anneals at the 3' end of the poly T and continues sequencing downstream. Sometimes designing a new primer that is closer to the homopolymeric region can help, as nucleotide concentration and enzyme activity will be in a more optimal range when extending the smaller fragments in the cycle sequencing reaction. And lastly, we can try adjusting our cycle sequencing conditions as higher annealing temperatures and longer extension times can sometimes be useful in cases like this. Similar approaches can be used when trying to sequence PCR products with homopolymeric regions, but, in the end, it may sometimes be necessary to clone the PCR product in order to read through the repetitive stretch.
Pic 12-1 Raw data of a sequencing run with down-hill pattern.
Pic 12-2 Chromatogram: the beginning section of Pic 12-1. Note that the peaks are clean and resolved well.
Pic 12-3 Chromatogram: the middle section of Pic 12-1. Note the elevated noise level and the signal quality starts to decrease.
Pic 12-4 Chromatogram: the latter part of Pic 12-1. Note that the peaks submerged into noisy background and the basecalling stops.
Observation: The above figures show a sequence that starts off nicely, but then there is a decrease in signal intensity, gradually descending to background level.
Cause: There are three possible causes for the above signal pattern: 1) Too much template. When too much template is added, the fluorescent substrates(BigDye) are consumed at the beginning stage of sequencing PCR reaction and thus little left for longer extension; 2) Not enough template is added, therefore, most template molecules are used in the beginning section; 3) Salt contamination. Salt contamination alone is not a big problem, but in combination with other trace contaminants, can erode accuracy, and shortens read lengths. Excessive amounts of salts will give rise to premature termination with strong signal followed by progressively weakening signal. Salts have an inhibitory effect on the processivity of the sequencing Taq polymerase, which can lead to an overabundance of short fragments, or if the salt concentration is too high, the enzyme will be completely inhibited with no sequence data obtained.
Solution: It's difficult to determine which one of the above mentioned factors caused the progressive decrease in signal intensity unless a test is run. We normally run a repeat with either more DNA or less DNA. If the same pattern continues to show up, we recommend customers to further purify DNA template with 70% isopropanol (30% water) and dry in a spin vac before resuspending in pure autoclaved water.
Pic 12-5: Raw data of a reaction with 4µl of template added. The template concentration is 100 ng/µl, therefore, 400ng of template were added to the reaction.
Pic 12-6: The same reaction in Pic 12-5 was repeated with 2µl of template (200ng) and the raw data is shown here. As you can see, much longer reading length is achieved.
Observation: Multiple peaks at the beginning section (ranging from 20 to 50 bases) in the sequencing data of PCR products, after which the sequence becomes clean.
Possible Cause 1: Frequently, when the primer that is used in generating PCR product is used in sequencing, the beginning section of the sequencing data is very messy and the signal intensity is higher than the rest of the sequence. This is caused by the sequencing of the nonspecific PCR products.
Solution: It is generally a good idea to design, and use a separate, nested sequencing primer in sequencing reaction since it will add specificity to sequencing reactions and thus, better quality data is generated. The increase in specificity results from the nested primer not annealing to any non-specific PCR products, primer dimmers or primer oligomers created in the PCR reactions. However, if the information you need to get from the sequencing data is beyond 50-60 bases, in order to save time and cost, PCR primer can be used in sequencing reactions.
Possible Cause 2: Messy starting section in PCR product sequencing could be also caused by too much DNA added to the reaction. When too much dye-labeled DNA is injected into sequencer, the signal intensity is so high that it goes off scale of analysis program. The peak that is off scale is cut off on top and the peak is moved to the position next to it. Therefore, multiple peaks are observed as shown in the next two figures. This is especially true for very short PCR products (200-400 bases).
Pic 13-1: Raw data of a PCR reaction that had too much DNA added. The signal intensity is so high that the peaks shoot out of the limit of the analysis program.
Pic 13-2: Chromatogram of the raw data shown in Pic 13-1.
Solution: Diluting the sample and reloading it in sequencer will solve the problem easily. It is also important for you to mark on your order form the size and accurate concentration of your PCR products. Accurate information will be very helpful to us in delivering high quality data efficiently to you.
When reactions fail to generate data, or are otherwise unsatisfactory to you, we may offer to repeat the reaction free of charge. In some situations, the repeated reaction will work on the second trial, leading to questions of why it failed the first time around. Below are a couple reasons why failed reactions sometimes can show better or even clean data when repeated:
Possible Reason 1: The reaction simply did not go forward the first time or the reaction failed due to some human errors. For example, the primer did not reach the mix at the bottom or did not anneal properly to the vector, or there were mistakes during the PCR process at our lab. The problem was eliminated upon repeating the reaction, thus giving clean data.
Possible Reason 2: In most situations, we don't repeat the reaction in the exact same way as the first time. We will look over the previous day's failed results and the amount of DNA and primers we added. Then, we determine how we can alter the reaction parameters in order to generate better results. This could mean anything from diluting a sample, adding more or less DNA and primer, including an extra reagent, such as DMSO, or any other options that could improve your data. The above is especially true when we process samples from new customers. It takes time for us to be familiar with the different characteristics of samples from different labs.
If you have any questions about what was changed in a reaction for a repeat, please feel free to contact us. We'll be more than happy to share any tricks we used to get the sequencing to work better.