Sequencing Confirmation¶

After you've completed your cloning, your next step is to verify the integrity of your plasmid. This tutorial walks through different sequencing strategies and how to choose and apply the right one for your purpose.

Choosing the Right Strategy¶

There are three main sequencing strategies available, each suited to different goals and budgets:

Cycle sequencing (like you used in the pP6 experiment) gives a ~1 kb window of high-quality sequence starting about 50 bp downstream of a primer. It's ideal for checking a specific region of a plasmid—like an insert or promoter.
Full-plasmid sequencing from providers like Plasmidsaurus uses long-read nanopore technology to return the complete sequence of your plasmid. This is more expensive (~$15/sample), but you get the entire plasmid and can catch unexpected rearrangements or background DNA.
NGS (Next-Generation Sequencing) is a high-throughput approach designed for analyzing complex populations. You PCR-amplify your targets with adapters and submit the pool for deep sequencing. This is powerful, but expensive, and more suitable for library screens than for single plasmid validation.

Method	Cost/sample	Output	Best For
Cycle sequencing	~$3.50	~1 kb from a primer	Targeted region confirmation
Full plasmid (Plasmidsaurus)	~$15	Entire plasmid sequence	Whole-plasmid verification, structural issues
NGS (deep sequencing)	$750+	Millions of short reads	Large libraries, pooled clone analysis

📌 Only cycle sequencing requires you to choose or design a primer. Full plasmid and NGS options use standardized workflows.

Quiz: Choosing a Sequencing Method¶

Scenario 1: You’re confirming that a single new antibiotic resistance gene was cloned correctly into a known plasmid backbone. Which sequencing method should you use?

Scenario 2: You’ve made a library of 5,000 variants of a ribosome binding site in a plasmid and want to determine the distribution of sequences present. What’s the right method?

Scenario 3: You’re validating the structure of a new multi-gene construct with internal repeats and want to ensure the whole thing is intact and mutation-free. Best method?

Designing Your Primer (Cycle Sequencing Only)¶

If you're using cycle sequencing, you must provide a primer. This primer must bind upstream of the region you wish to confirm.

The sequencing read begins ~50 bp downstream of the primer.
You can use standard primers (e.g., M13rev, G00101), but only if your target region lies ~50 bp downstream of their site.
Otherwise, design a custom primer:
Length: 18–25 bp
GC content: ~40–60%
Avoid long repeats or secondary structure
Must match the strand in the direction you wish to read

Important: Plasmid sequences are circular. In GenBank files, the “start” of the sequence is arbitrary. A feature near the beginning (like the T7 promoter) may require you to rotate the sequence to design a primer “upstream” of it, even if that’s technically the end of the GenBank string.

Quiz: Primer Design for T7 + INS¶

Download the plasmid: ⬇️ pET-INS.seq

Your goal: Design a 20 bp oligo that will allow a sequencing read to start at the T7 promoter and cover the INS gene.

Paste your oligo sequence below (5' to 3', exact match to template strand):

Defining the Confirmation Target¶

Before you can interpret your sequencing results, you need to define the confirmation target—the specific region of the plasmid you care about confirming. The choice of this region depends on how much certainty you need about the sequence, and what can already be inferred from functional outcomes (e.g., antibiotic resistance, visible fluorescence, or selection as from an activity screen). As a rule of thumb, the confirmation target is a subsequence of the full plasmid model that includes:

Any inserted or deleted regions
Any modified regulatory elements (e.g., promoters, RBSs)
Any junctions created by cloning
Any regions you targeted for mutagenesis

Case Study: pET-INS¶

For the pET-INS plasmid:

The kanamycin resistance (kanR) and the origin of replication (ori) are inherited from the original pET28a vector. Since you selected colonies on a kanamycin plate, these must be functional. Even if they contain silent mutations, they won't affect your experiment, so they don’t need to be reconfirmed.
The confirmation target is the T7 promoter, RBS, and INS gene, along with enough flanking sequence to include sites of oligo binding during construction.

This entire block—the T7 + RBS + INS insert—is the confirmation region, and it is treated as a whole. The sequencing read must cleanly and correctly cover this entire region to confirm the plasmid’s correctness.

Once you’ve defined your confirmation target, your task is to determine whether your sequencing read contains it accurately and completely. This involves aligning your read to the model plasmid and checking whether the confirmation region is covered and error-free.

Analyzing Sequencing Data¶

To evaluate your sequencing outcome, you’ll align your sequencing read to the reference model and examine how well it covers your defined confirmation target.

Performing an Alignment¶

Use ApE or Benchling to perform sequence alignment:

In ApE: Open both the read and the model plasmid. Go to Tools → Align Sequences..., select both files, and click OK.
In Benchling: Use the alignment tool in the side panel to align your sequencing read to the model plasmid.
You can also use any of various webtools like https://en.vectorbuilder.com/tool/sequence-alignment.html

Look for:

Start of alignment: Does the read begin at the expected point downstream of your primer?
Full coverage of the confirmation target
Perfect match or any deviations

If the confirmation target is entirely within the read, then it is 'perfect' and you are done with the analysis. Otherwise, there may be a mutation present.

Types of Sequence Deviations¶

Point mutations¶

These are single base changes. In open reading frames (ORFs), they fall into three categories:

Silent: The codon is changed, but the same amino acid is encoded.
Missense: The codon is changed to encode a different amino acid.
Nonsense: The codon is changed to a stop codon, truncating the protein.

If you find a single base difference between your sequencing read and the confirmation target within an ORF, translate the surrounding region in both the read and the model to determine whether the mutation is silent, missense, or nonsense.

Alignment of model and read sequences showing a point mutation resulting in a stop codon. The codon TGC (Cys) in the model is mutated to TGA (Stop) in the read.

Figure: Example of a nonsense mutation. In the model, the codon TGC codes for cysteine (C). In the read, a point mutation changes it to TGA, a stop codon (*), truncating the protein. This is a single base change with a major functional consequence.

Indels¶

Insertions or deletions (indels) can be especially problematic in ORFs. If not in multiples of 3, they cause frameshifts, scrambling the downstream protein sequence. Even a single base insertion or deletion can shift the reading frame, changing every amino acid after the mutation and often introducing a premature stop codon. Indels in regulatory regions or non-coding areas may have less dramatic effects but can still disrupt motifs or regulatory elements.

Alignment of model and read sequences showing a deletion. A 12 bp segment present in the model is missing from the read.

Figure: Example of an indel mutation. The model contains a region coding for multiple amino acids. In the read, this region is deleted, leading to a frameshift. All downstream codons are shifted, likely disrupting the entire ORF.

Deviation Type	Description	Effect in ORFs	Effect in Regulatory Regions	Effect in Non-Coding
Exact Match	No differences	✅ Ideal	✅ Ideal	✅ Ideal
Silent Mutation	Codon changes, same amino acid	Usually benign	May affect motif behavior	None
Missense	Codon changes, new amino acid	May change function	Not applicable	None
Nonsense	Creates stop codon	Likely disruptive	Not applicable	None
Indel	Insertion/deletion; may cause frameshift	Frameshift if not multiple of 3; severe	May disrupt motifs/regulatory elements	Sometimes none
Frameshift	Insertion/deletion disrupting codons (not by 3)	Severe	Not applicable	None
Regulatory change	Affects promoter, RBS, etc.	Not applicable	Can disrupt expression	None
Structural error	Duplication, truncation, or wrong orientation	Varies	Varies	Varies

Quality Issues¶

Sometimes you can’t interpret the read because:

Ns or ambiguous bases: Signal dropout or primer failure
Short reads: May not reach the confirmation target
Mixed signals: More than one DNA template in the reaction

Examples of sequencing trace quality showing three types of read profiles: high-quality, low-quality, and mixed template.

Figure: Sequencing trace types. Left: High-Quality Read — Tall, sharp peaks with clear base calls. Middle: Low-Quality Signal — Short, noisy peaks with Ns and base ambiguity. Right: Mixed Template — From around base 169, each position shows two well-resolved peaks, suggesting the presence of two DNA templates in the same prep. Mixed reads indicate either two different plasmids in the cell, or instability of the original plasmid. Either way, it is problematic.

Final Call Categories¶

You must assess the read and decide:

Perfect: All of the confirmation region is covered with high-quality data, and there are no mismatches.
Perfect Partial: What you can see is perfect, but part of the confirmation region is not covered due to read length or quality limitations.
Mutant: Read has deviations in the confirmation target.
Mixed Clone: Trace shows evidence of multiple sequence populations—typically due to more than one plasmid being present in the miniprep prep.
Failed: Poor quality read, unreadable, or no match to target.

Quiz: Sequence Interpretation¶

Download the full quiz data set here:

⬇️ Download All Cases (ZIP)

Each folder in the zip includes:

A .ab1 trace
A .seq read
A .gb model with the confirmation target annotated

Case 1: missense_in_ORF

What kind of outcome is shown in this sequencing result?

Case 2: perfect_partial

What kind of outcome is shown in this sequencing result?

Case 3: pJ19

What kind of outcome is shown in this sequencing result?

Case 4: indel

What kind of outcome is shown in this sequencing result?