Protein splicing

The post-translational removal of peptide sequences from within a protein sequence
mechanism of protein splicing involving inteins

The mechanism of protein splicing involving inteins. In this scheme, the N-extein is shown in red, the intein in black, and the C-extein in blue. X represents either an oxygen or sulfur atom.

Protein splicing is an intramolecular reaction of a particular protein in which an internal protein segment (called an intein) is removed from a precursor protein with a ligation of C-terminal and N-terminal external proteins (called exteins) on both sides. The splicing junction of the precursor protein is mainly a cysteine or a serine, which are amino acids containing a nucleophilic side chain. The protein splicing reactions which are known now do not require exogenous cofactors or energy sources such as adenosine triphosphate (ATP) or guanosine triphosphate (GTP). Normally, splicing is associated only with pre-mRNA splicing. This precursor protein contains three segments—an N-extein followed by the intein followed by a C-extein. After splicing has taken place, the resulting protein contains the N-extein linked to the C-extein; this splicing product is also termed an extein.

Intein splicing occurs post-translationally in a self-catalytic process. Here, the extein is shown in red and the intein in blue. Image created with

Naming conventions

The first part of an intein name is based on the scientific name of the organism in which it is found, and the second part is based on the name of the corresponding gene or extein. For example, the intein found in Thermoplasma acidophilum and associated with Vacuolar ATPase subunit A (VMA) is called \”Tac VMA\”.

Normally, as in this example, just three letters suffice to specify the organism, but there are variations. For example, additional letters may be added to indicate a strain. If more than one intein is encoded in the corresponding gene, the inteins are given a numerical suffix starting from 5′ to 3′ or in order of their identification (for example, \”Msm dnaB-1\”).

The segment of the gene that encodes the intein is usually given the same name as the intein, but to avoid confusion the name of the intein proper is usually capitalized (e.g., Pfu RIR1-1), whereas the name of the corresponding gene segment is italicized (e.g., Pfu rir1-1).

Types of inteins

The type of the splicing proteins is categorized into four classes: maxi-intein, mini-intein, trans-splicing intein, and alanine intein. Maxi-inteins are N- and C-terminal splicing domains containing an endonuclease domain. The mini-inteins are typical N- and C-terminal splicing domains; however, the endonuclease domain is not present. In trans-splicing inteins, the intein is split into two (or perhaps more) domains, which are then divided into N-termini and C-termini. Alanine inteins have the splicing junction of an alanine instead of a cysteine or a serine, in both of which the protein splicing occurs.

Full and mini inteins

Inteins can contain a homing endonuclease gene (HEG) domain in addition to the splicing domains. This domain is responsible for the spread of the intein by cleaving DNA at an intein-free allele on the homologous chromosome, triggering the DNA double-stranded break repair (DSBR) system, which then repairs the break, thus copying the intein-coding DNA into a previously intein-free site. The HEG domain is not necessary for intein splicing, and so it can be lost, forming a minimal, or mini, intein. Several studies have demonstrated the modular nature of inteins by adding or removing HEG domains and determining the activity of the new construct.

Split inteins

Sometimes, the intein of the precursor protein comes from two genes. In this case, the intein is said to be a split intein. For example, in cyanobacteria, DnaE, the catalytic subunit α of DNA polymerase III, is encoded by two separate genes, dnaE-n and dnaE-c. The dnaE-n product consists of an N-extein sequence followed by a 123-AA intein sequence, whereas the dnaE-c product consists of a 36-AA intein sequence followed by a C-extein sequence.

Applications in biotechnology

Inteins are very efficient at protein splicing, and they have accordingly found an important role in biotechnology. There are more than 200 inteins identified to date; sizes range from 100–800 AAs. Inteins have been engineered for particular applications such as protein semisynthesis and the selective labeling of protein segments, which is useful for NMR studies of large proteins.

Pharmaceutical inhibition of intein excision may be a useful tool for drug development; the protein that contains the intein will not carry out its normal function if the intein does not excise, since its structure will be disrupted.

It has been suggested that inteins could prove useful for achieving allotopic expression of certain highly hydrophobic proteins normally encoded by the mitochondrial genome, for example in gene therapy. The hydrophobicity of these proteins is an obstacle to their import into mitochondria. Therefore, the insertion of a non-hydrophobic intein may allow this import to proceed. Excision of the intein after import would then restore the protein to wild-type.

Affinity tags have been widely used to purify recombinant proteins, as they allow the accumulation of recombinant protein with little impurities. However, the affinity tag must be removed by proteases in the final purification step. The extra proteolysis step raises the problems of protease specificity in removing affinity tags from recombinant protein, and the removal of the digestion product. This problem can be avoided by fusing an affinity tag to self-cleavable inteins in a controlled environment. The first generation of expression vectors of this kind used modified Saccharomyces cerevisiae VMA (Sce VMA) intein. Chong et al. used a chitin binding domain (CBD) from Bacillus circulans as an affinity tag, and fused this tag with a modified Sce VMA intein. The modified intein undergoes a self-cleavage reaction at its N-terminal peptide linkage with 1,4-dithiothreitol (DTT), β-mercaptoethanol (β-ME), or cystine at low temperatures over a broad pH range. After expressing the recombinant protein, the cell homogenate is passed through the column containing chitin. This allows the CBD of the chimeric protein to bind to the column. Furthermore, when the temperature is lowered and the molecules described above pass through the column, the chimeric protein undergoes self-splicing and only the target protein is eluted. This novel technique eliminates the need for a proteolysis step, and modified Sce VMA stays in column attached to chitin through CBD.

Recently inteins have been used to purify proteins based on self aggregating peptides. Elastin-like polypeptides (ELPs) are a useful tool in biotechnology. Fused with target protein, they tend to form aggregates inside the cells. This eliminates the chromatographic step needed in protein purification. The ELP tags have been used in the fusion protein of intein, so that the aggregates can be isolated without chromatography (by centrifugation) and then intein and tag can be cleaved in controlled manner to release the target protein into solution. This protein isolation can be done using continuous media flow, yielding high amounts of protein, making this process more economically efficient than conventional methods. Another group of researchers used smaller self aggregating tags to isolate target protein. Small amphipathic peptides 18A and ELK16 (figure 5) were used to form self cleaving aggregating protein.

Applications in Antimicrobial Development

Over the last twenty years, there has been increasing interest in leveraging inteins for antimicrobial applications. Intein splicing is found exclusively in unicellular organisms, with a particularly high abundance in pathogenic microorganisms. Furthermore, inteins are commonly found within housekeeping proteins and/or proteins involved in the survival of the organism within a human host. Post-translational intein removal is necessary for the protein to properly fold and function. For example, Gaëlle Huet et al. demonstrated that in Mycobacterium tuberculosis, unspliced SufB prevents the formation of the SufBCD complex, a component of the SUF machinery. As such, the inhibition of intein splicing may serve as a powerful platform for the development of antimicrobials.

Current research on intein splicing inhibitors has focused on developing antimycobacterials (M. tb. has three intein-containing proteins), as well as agents active against pathogenic fungi Cryptococcus and Aspergillus. Cisplatin and similar platinum-containing compounds inhibit splicing of the M. tb. RecA intein through coordinating to catalytic residues. Divalent cations, such as copper (II) and zinc (II) ions, function similarly to reversibly inhibit splicing. However, neither of these methods are currently suitable for an effective and safe antibiotic. The fungal Prp8 intein is also inhibited by divalent cations and cisplatin through interfering with the catalytic Cys1 residue. In 2021, Li et al. showed that small molecule inhibitors of Prp8 intein splicing were selective and effective at slowing the growth of C. neoformans and C. gattii, providing exciting evidence for the antimicrobial potential of intein splicing inhibitors.


Leave a Reply