PTP-central   -   A resource of protein tyrosine phosphatases in eukaryotic genomes

Sequence-based identification of PTPs with Y-Phosphatomer

Classification of PTPs

The PTP superfamily is divided into 4 distinct classes that differ both in their catalytic mechanisms and phosphatase catalytic domain sequences.

Class I are cysteine-based PTPs, including the classical tyrosine-specific phosphatases (both receptor and non-receptor), and the dual-specificity phosphatases (DSPs or VH1-like).

Class II PTPs are a small but evolutionarily conserved group of PTPs with only one member in human (ACP1).

Class III PTPs, like Class I and II enzymes, are also cysteine-based and display specificity towards phosphotyrosine and phosphothreonine residues.

The fourth class of PTPs have an aspartic acid-based catalytic mechanism, and is represented by the developmentally important Eyes Absent ('EyA') genes.

Y-Phosphatomer is a sequence-based method for the automatic prediction and class-level classification of eukaryotic PTPs. Y-Phosphatomer relies on the specific combination of 14 publicly available protein domain models (mostly in the form of profile hidden Markov models, HMMs) diagnostic for the various PTP families. Briefly we characterized the specific combination of protein domain signatures from the smallest number of protein domain databases (Pfam, PRINTS, SMART, SUPERFAMILY and TIGRFAMs) that allows the identification and classification of a curated set of human PTPs into their correct classes, without cross-hitting other classes (Figure 1).

Figure 1. Flow diagram of the Y-Phosphatomer method for the automatic classification of PTPs. The human set of PTPs was analyzed with a local installation of InterProScan run with default parameters (steps 1 and 2). This analysis determined that the PTPs from the four classes could be unequivocally distinguished by a specific combination of protein domain models, mostly in the form of HMMs (step 3 and 4). We took advantage of this property to build the Y-Phosphatomer library consisting of only 14 protein models (step 5). The Y-Phosphatomer library was evaluated on two data sets and uniformly reported perfect coverage, and a mis-classification rate of zero on the PTP class level (step 6). Therefore, Y-Phosphatomer is a robust library that can be applied to the genome-wide annotation of PTPs (step 7).

Evaluation of Y-Phosphatomer
We first evaluated Y-Phosphatomer on the entire set of tyrosine-specific PTPs from the PTP Database (, including 383 distinct sequences from 61 species and five distinct phyla. Y-Phosphatomer was capable of identifying all these sequences and classify them correctly as tyrosine-specific PTPs. In a second exercise, Y-Phosphatomer was tested on an experimentally validated protein dataset from the UniProt database (n=124). Excluding incorrectly annotated sequences (n=28), Y-Phosphatomer correctly classified 100% of the sequences (n=96) into their correct classes. These two evaluation tests suggest that Y-Phosphatomer can retrieve PTPs and classify them into their correct classes with a coverage and correct classification rate of 100%.

The PTP complements of 65 eukaryotic genomes
Y-Phosphatomer was used to scan the predicted peptide datasets of 65 distinct eukaryotic genomes, including species belonging to four of the five eukaryotic supergroups (unikonts, excavates, plants and chromalveolates). The results are summarized in Figure 2.

Figure 2. Overview of the contents of PTP-central. (A) In the 65 eukaryotic genomes analyzed, nearly 50% of the 4605 PTP sequences are tyrosine-specific phosphatases, closely followed by dual-specificity phosphatases (~43%). LMWP/CDC25/EyA class phosphatases comprise less than 3% each of the entire dataset. (B) For most species a linear correlation exists between a species’ tyrosine phosphatome and the total number of proteins encoded in the genome. Plants are the exception, where the sizes of tyrosine phosphatomes remain relatively constant despite large increases in genome sizes. (C) Only tyrosine-specific and dual-specificity phosphatases are universally present in all eukaryotic supergroups surveyed. (D) PTP-central contains a manually curated data set of 339 structures, the vast majority of which are human tyrosine-specific phosphatases from crystallographic studies.