Table of Contents
The genetic code is the set of rules that tells a cell how the sequence of nucleotides in nucleic acids (DNA or RNA) corresponds to the sequence of amino acids in a protein. In other words, it defines how information written in the “language” of four bases is translated into the “language” of 20 different amino acids.
In this chapter, the focus is on the structure, properties, and biological implications of the genetic code, not on the full mechanics of transcription and translation (these are handled elsewhere).
From Nucleotides to Amino Acids: Codons
In RNA, genetic information is read in groups of three nucleotides (triplets). Each such group of three is called a codon.
- The four RNA bases are: A (adenine), U (uracil), C (cytosine), G (guanine).
- A codon is any ordered triplet of these bases, for example:
AUGGCUUGA
Because there are four possible bases in each of three positions, the total number of possible codons is:
$$4^3 = 64$$
These 64 codons collectively specify:
- 20 standard amino acids
- 3 stop signals (termination codons)
DNA uses the same information, but with T (thymine) instead of U (uracil). During transcription, DNA triplets are copied into complementary RNA codons, which are then read during translation.
Features of the Genetic Code
Several key properties of the genetic code are important for understanding how it works in all organisms.
Triplet Code
Each amino acid (or stop signal) is encoded by a sequence of three nucleotides.
- One nucleotide alone ($4^1 = 4$ combinations) would not be enough to encode 20 amino acids.
- Two nucleotides ($4^2 = 16$ combinations) are still not enough.
- Three nucleotides ($4^3 = 64$ combinations) are sufficient to encode all amino acids plus punctuation signals.
Thus, the genetic code is a triplet code.
Non-overlapping and Comma-free
The code is read in continuous, non-overlapping triplets from a defined starting point:
- Non-overlapping: Each nucleotide is part of only one codon.
- For example, the RNA sequence
AUGGCU...is read asAUG|GCU| ... - It is not read as
AUG, thenUGG, thenGGC, etc. - Comma-free: There are no “comma” bases separating codons. Once translation starts, the ribosome reads one codon after another without skips.
Because of this, the correct reading frame is crucial. Shifting the reading frame by one base (a frameshift) changes all codons downstream and usually destroys the original protein information.
Degeneracy (Redundancy) of the Code
The genetic code is degenerate: most amino acids are specified by more than one codon.
- Examples:
- Leucine (Leu) is encoded by six codons:
UUA,UUG,CUU,CUC,CUA,CUG. - Serine (Ser) is encoded by six codons.
- Glycine (Gly) is encoded by four codons:
GGU,GGC,GGA,GGG. - Methionine (Met) is encoded by just one codon:
AUG.
Key points:
- No single codon encodes more than one amino acid.
- But several codons can encode the same amino acid.
- Often, codons for the same amino acid differ only in the third base position (e.g.
GCU,GCC,GCA,GCGall encode alanine).
Biological consequence: many single-base changes, especially in the third position of a codon, can be silent mutations (they do not change the amino acid and therefore often have no effect on the protein’s primary structure).
Unambiguous
Despite being degenerate, the code is unambiguous:
- Each codon specifies one unique meaning (one particular amino acid or a stop signal).
- For example,
AUGalways encodes methionine, and never anything else.
This unambiguity ensures that the translation machinery can reliably interpret the nucleotide sequence.
Nearly Universal
The genetic code is almost the same in all known organisms, from bacteria to humans.
- In most cases, the same codon specifies the same amino acid in:
- Prokaryotes
- Eukaryotes
- Archaea
- Many viruses
This near-universality suggests that the code arose early in the history of life and has been conserved.
Known exceptions (only concepts here, details belong elsewhere):
- Mitochondrial genomes in many organisms use slightly modified codes.
- Some single-celled eukaryotes (certain protists) have reassigned a few codons.
- In these cases, a codon that is a stop in the “standard” code may code for an amino acid, or vice versa.
Despite these exceptions, the code is sufficiently universal that genes can be moved between species (for example, in genetic engineering) and still be correctly translated, often with only minor adjustments if any.
Special Codons: Start and Stop
Within the genetic code, certain codons serve not only to specify amino acids but also to mark the beginning and end of translation.
Start Codon
The most important start codon in mRNA is:
AUG– encodes methionine (Met)
Roles of AUG:
- Start signal: marks the usual point where the ribosome begins translating the mRNA into protein.
- Amino acid: in addition to being a start signal,
AUGcodes for methionine inside proteins as well.
In prokaryotes, the first methionine is often formylated (formyl-methionine), but the codon is still AUG.
Not every AUG in an mRNA is a start site. The first AUG recognized in the correct context (depending on sequences around it and features described in other chapters) is typically used to initiate translation.
Stop Codons
Three codons do not code for any amino acid. Instead, they signal the termination of translation:
UAAUAGUGA
These are stop codons (also called termination or nonsense codons). When a ribosome encounters a stop codon:
- No tRNA matches that codon with an amino acid.
- Release factors bind and cause the newly synthesized polypeptide to be released.
- The ribosome separates from the mRNA.
Stop codons thus act as punctuation marks that define where a protein ends.
Wobble and the Third Base
The degeneracy of the code is connected to the wobble hypothesis (details of tRNA structure are covered elsewhere).
Basic idea:
- Each codon on mRNA is recognized by an anticodon on a tRNA.
- The first two bases of the codon usually pair strictly with the corresponding bases in the anticodon.
- The third position of the codon (the “wobble position”) often allows more flexible pairing.
- A single tRNA can recognize more than one codon if they differ only at the wobble position.
Consequences:
- Cells do not need 61 different tRNAs (one for each sense codon). Fewer tRNAs, with wobble pairing, are sufficient.
- Many amino acids have a “codon family” where variation at the third base does not change the amino acid (e.g. alanine:
GCU,GCC,GCA,GCG).
This flexibility, combined with degeneracy, provides robustness against some mutations and errors in base pairing.
Reading Frames and Open Reading Frames (ORFs)
Because codons are read in groups of three, a single RNA sequence can be read in different reading frames, depending on where translation starts.
For example, consider the nucleotide sequence:
AUGGCUACU...
Possible frames:
- Frame 1:
AUG|GCU|ACU| ... - Frame 2:
U GG|CUA|CU...(starting from the second base) - Frame 3:
GG C|UAC|U...(starting from the third base)
Only one frame typically encodes the correct, functional protein. The others usually contain premature stop codons or nonsensical amino acid sequences.
An open reading frame (ORF) is a stretch of nucleotide sequence that:
- Begins with a start codon (
AUG, in the usual case), - Continues without in-frame stop codons,
- Ends at a stop codon (
UAA,UAG, orUGA).
ORFs are important in identifying potential protein-coding regions in DNA and RNA sequences.
Evolutionary and Functional Implications of the Code
The structure of the genetic code appears to reduce the impact of some mutations:
- Amino acids with similar properties (e.g. similar size or charge) are often encoded by codons that differ by only one base.
- A single base substitution may therefore lead to either:
- No change in amino acid (silent mutation),
- Or a substitution with a chemically similar amino acid (often less harmful than a drastically different one).
This arrangement suggests that the code is not random, but has been shaped by evolutionary processes to be relatively error-tolerant.
The near-universality of the code also implies:
- A common origin of life on Earth, with a shared informational system.
- The possibility of horizontal gene transfer across species (natural or artificial), because the receiving organism can still interpret the code.
These themes connect the genetic code to broader topics such as evolution, molecular biology techniques, and biotechnology, which are discussed in other chapters.
Overview of Codon–Amino Acid Assignments (Conceptual)
While a complete codon table belongs in reference material, a conceptual overview is useful:
- Start codon:
AUG→ methionine (Met), also start signal- Stop codons:
UAA,UAG,UGA→ termination signals- Several amino acids with multiple codons:
- Leucine, serine, arginine: 6 codons each
- Many others: 2–4 codons
- Amino acids with only one codon:
- Methionine (Met):
AUG - Tryptophan (Trp):
UGG
Knowing that each amino acid corresponds to specific codons, and that particular codons mark start and stop, is the essential functional content of the genetic code.