Clinical Proteomics: Technical Advances and Emerging Applications

Advances in mass spectrometry and machine learning could revolutionize clinical diagnostics and drug development

Raeesa Gupte, PhD

As drivers of cellular function, proteins play an important role in health and disease. Dynamic changes in expression, subcellular localization, modification, and interactions contribute to the functional diversity of proteins.

The proteome consists of the assortment of proteins expressed by particular cells or tissues at any given time under defined conditions. Proteomics—the study of the proteome—involves systematic large-scale experimental analyses to identify and quantify proteins, protein-protein or protein-nucleic acid interactions, and post-translational modifications that control protein activity.

Proteomics is often used in clinical research as a tool for exploring cellular pathways and disease processes. Clinical proteomics can be divided into two areas: expression proteomics and functional proteomics. Expression proteomics compares the differential expression between proteomes of cells, tissues, or organisms to a control proteome (e.g., healthy or untreated conditions) in an experimental condition (e.g., disease or drug therapy). Functional proteomics explores how protein modifications and interactions affect protein function during physiological and pathological conditions.

Early proteomic studies relied on 2D electrophoresis for separating and quantifying proteins from biological samples. In the early 1990s, the addition of mass spectrometry to the repertoire of protein analytical techniques, along with sequence databases and database search tools, transformed proteomic profiling. Mass spectrometry, once the workhorse of analytical chemists, paved the way for high-resolution and high-throughput gel-independent proteomic approaches.

Over the past decade, technological advances in mass spectrometry and bioinformatics have facilitated novel applications of clinical proteomics in biomarker discovery, drug development, and disease etiology.

Technical advances in proteomics 


 Mass spectrometers generate ions from a sample under investigation and sort them based on their mass-to-charge ratio. The relative abundance of each ion is then determined with respect to all the ionic species in the sample. Currently, the most common mass spectrometric methods for characterization of the proteome are liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), electron spray ionization (ESI), and matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF). The improved sensitivity, resolution, and speed of these techniques enables high-throughput proteomic analysis of complex biological samples. 

Mass spectrometry continues to undergo improvements in acquisition rates and strategies. Advances in instrumentation of hybrid mass analyzers enable higher scanning rates. Data acquisition strategies are also evolving. Traditionally, mass spectrometers relied on data dependent acquisition (DDA). DDA uses information obtained during the acquisition to decide which precursor ions should undergo fragmentation. Data independent acquisition (DIA) is now gaining wider acceptance. In DIA, fragmentation parameters are predefined, resulting in more sensitive and accurate protein quantification compared to DDA.2 Selective reaction monitoring (SRM) is a form of targeted DIA, while sequential window acquisition of all theoretical mass spectra (SWATH-MS) is a type of untargeted DIA. 


 Machine learning techniques are useful for extracting relevant information from large datasets, such as those generated by tandem mass spectrometry. Supervised machine learning involves training an algorithm on labeled data to learn patterns. The algorithm is then tested on an unlabeled dataset to identify proteins and the expression, interaction, and modification patterns relevant to a disease state or treatment condition. In contrast, unsupervised machine learning uses unlabeled data to group together samples with similar attribute profiles. 

Machine learning has been used to classify and identify protein biomarkers across a variety of diseases including ovarian cancer, breast cancer, prostate cancer, heart disease, and amyotrophic lateral sclerosis.3 In addition to being time consuming, the current methods of data analysis fail to identify all proteins in a sample, or may identify them incorrectly. Last year, a predictive deep learning model called Prosit was designed to identify proteins faster than the standard approach of database searching and with almost no errors.4 When integrated into database search pipelines, it led to faster identifications with more than 10 times lower false discovery rates. The software can be used to generate spectral libraries for data-independent acquisition. The algorithm, trained on 100 million mass spectra, can be used on all common mass spectrometers without additional training. 

Emerging applications of clinical proteomics


The molecular and functional properties of individual cells often vary from the population average. In fact, genetically identical cells can exhibit distinct functional phenotypes. Characterizing these differences between individual cells is critical in understanding normal physiology in heterogenous tissues like the brain. It also provides insights into mechanisms of cancer recurrence, stem cell differentiation, and drug efficacy. 

Antibody-based methods such as flow cytometry and mass cytometry are commonly used for single-cell proteomic assays.5 These approaches are limited by specificity of antibodies and the number of proteins that can be analyzed per cell. High sensitivity and high throughput are crucial for protein characterization at the single cell level. Mass spectrometry-based methods are ideal for single-cell proteomic profiling because they fulfill both these criteria. For example, LC-MS/MS was used to identify 450 proteins from single human oocytes that have ~100 ng of protein content per cell.6 Moreover, enrichment techniques allow mass spectrometry to be performed even with small amounts of clinical samples. For instance, enrichment using laser micro-dissection followed by LC-MS/MS can identify thousands of single cell-specific proteins from a 1 mm-thick section of the human brain.7 Such an approach has been used to investigate changes in specific cell populations in Alzheimer’s disease, multiple sclerosis, and stroke.8 Single-cell proteomic analyses also play an important role in understanding the heterogeneity of the tumor micro-environment and its response to chemotherapy.9 When combined with advances in microfluidics, multiplexing, and automation, MS can reduce costs and facilitate analysis at the subcellular or organelle level. 


Metals are important co-factors for enzymes and structural stabilizers for proteins. Therefore, they play a critical role in cellular processes and are implicated in the pathophysiology of neurodegenerative disorders. Metalloproteomics is the comprehensive analysis of function, subcellular localization, and stoichiometry of metal-protein complexes.

The field of metalloproteomics is technically challenging. Special care needs to be taken to preserve the integrity of metalloproteins during sample processing. In addition, it requires high-resolution protein separation techniques to be coupled with highly sensitive metal detection methods. Protein separation methods such as gel electrophoresis or liquid chromatography coupled with metal detection methods such as inductively coupled plasma mass spectrometry (ICP-MS) have been developed.10 These methods provide a deeper understanding of the molecular mechanisms of toxicity and chemoresistance of anticancer metallodrugs like cisplatin. They have also been used to identify the biological targets of bismuth-based antimicrobial drugs.11 Similarly, size-exclusion chromatography followed by ICP-MS on human cerebrospinal fluid samples was used to identify putative biomarkers of cerebral vasospasm.12 


Variants of a single gene product in the form of isoforms or cleavage products as well as chemical modifications of amino acid residues impart functional diversity to proteins. Quantifying protein substrates and their post-translational modification (PTM) sites is key to dissecting complex regulatory networks that play a role in health and disease. Accordingly, large-scale proteomic PTM analyses aim to identify modification sites on proteins, quantify how these modifications are altered under specific conditions, and examine the functional consequences of such modifications. 

Phosphorylation is the most common and clinically relevant protein modification. Aberrant phosphorylation is implicated in neurodegenerative disorders and cancer. Other PTMs including glycosylation, ubiquitination, acetylation, and methylation affect protein expression, localization, interactions, and degradation. 

PTMs can be characterized by mass spectrometry in two ways: 1) using a top-down approach to measure the mass of intact proteins or 2) using a bottom-up approach to analyze proteolytically digested peptides. Due to the low abundance and transient nature of phosphorylation, peptides are enriched using antibodies or chemicals. This improves sensitivity and sample throughput. Thus, mass spectrometry can detect phosphorylation on a large number of proteins compared to conventional antibody-based proteomic techniques. For instance, phosphoproteomic analyses performed on postmortem brains of patients with Alzheimer’s disease have identified 3,715 phosphosites on 1,455 proteins.13 Similarly, mass spectrometric phosphoproteome analysis of HIV-infected brains identified 112 phosphorylated proteins and 17 previously unknown phosphorylation sites.14 Phosphoproteome profiling was also performed on metastatic tumors obtained posthumously from prostate cancer patients. These were used to study inter-patient and intra-patient heterogeneity in kinase activation.15 Proteomic profiling of other PTMs has been performed in mammalian cells, animal models of cardiovascular disease16 and on human proteins in vitro.17 But testing on clinical samples is limited due to concerns about sample integrity, assay resolution, and computational challenges. 


Pharmacoproteomics focuses on the use of proteomic analyses in the discovery and development of pharmaceutical agents. The path from preclinical testing to regulatory approval of new drugs is a long and expensive process that is often fraught with failure. Besides failing to meet predetermined clinical endpoints, drug candidates may be abandoned due to poor pharmacokinetic properties and adverse effects. Proteomic profiling allows comprehensive examination of the mechanism of action, toxicity, and potential for drug resistance. Therefore, pharmacoproteomics can be applied to drug development and patient stratification or enrollment in clinical trials. Pharmacoproteomic workflows may be global or targeted. Targeted approaches involve affinity-based or activity based profiling techniques that use molecular probes to detect specific proteins. Global methods enable unbiased large-scale analysis by mass spectrometry. The global approach has showed that anticancer compounds may elicit their effects by acting on multiple protein targets.18 

Pharmaceutical companies are also considering whether proteomic analyses can be used to identify novel therapeutic biomarkers in clinical trials. The therapeutic biomarkers could potentially explain whether an intervention would work in a particular patient and to what extent. A preliminary study presented at the 2019 Mass Spectrometry Applications to the Clinical Lab conference used mass spectrometry on tissue samples of colorectal cancer to measure change in expression of 9,000 proteins. A targeted proteomics approach was used in the same study to quantify 12 pre-selected biomarkers for tumor stratification.19 Similarly, proteomics can be used to assess the adverse effects of novel therapies in clinical trials.20


Technical advances in mass spectrometry and machine learning promise to transform personalized medicine. However, challenges in sample preparation and lack of standardization of the analytical procedures and workflows hamper widespread use of mass spectrometry in clinical laboratories. Unless these challenges are addressed, proteomics will continue to be used as a preclinical hypothesis-generating tool instead of a widely accepted clinical diagnostic.


1. Patterson, S. D. & Aebersold, R. H. Proteomics: The first decade and beyond. Nature Genetics vol. 33 311–323 (2003).

2. Meyer, J. G. & Schilling, B. Clinical applications of quantitative proteomics using targeted and untargeted data-independent acquisition techniques. Expert Review of Proteomics (2017).

3. Swan, A. L., Mobasheri, A., Allaway, D., Liddell, S. & Bacardit, J. Application of machine learning to proteomics data: Classification and biomarker identification in postgenomics biology. OMICS A Journal of Integrative Biology (2013).

4. Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nature Methods (2019).

5. Levy, E. & Slavov, N. Single cell protein analysis for systems biology. Essays in Biochemistry (2018).

6. Virant-Klun, I., Leicht, S., Hughes, C. & Krijgsveld, J. Identification of maturation-specific proteins by single-cell proteomics of human oocytes. Mol. Cell. Proteomics (2016).

7. Davis, S., Scott, C., Ansorge, O. & Fischer, R. Development of a sensitive, scalable method for spatial, cell-type-resolved proteomics of the human brain. Journal of Proteome Research (2019).

8. Scifo, E. et al. Recent advances in applying mass spectrometry and systems biology to determine brain dynamics. Expert Review of Proteomics (2017).

9. Lu, Y., Yang, L., Wei, W. & Shi, Q. Microchip-based single-cell functional proteomics for biomedical applications. Lab on a Chip (2017).

10. Da Silva, M. A., Sussulini, A. & Arruda, M. A. Metalloproteomics as an interdisciplinary area involving proteins and metals. Expert Review of Proteomics (2010).

11. Wang, Y., Wang, H., Li, H. & Sun, H. Metallomic and metalloproteomic strategies in elucidating the molecular mechanisms of metallodrugs. Dalton Transactions (2015).

12. Zhang, Y., Clark, J. F., Pyne-Geithman, G. & Caruso, J. Metallomics study in CSF for putative biomarkers to predict cerebral vasospasm. Metallomics (2010).

13. Tan, H. et al. Refined phosphopeptide enrichment by phosphate additive and the analysis of human brain phosphoproteome. Proteomics (2015).

14. Uzasci, L., Auh, S., Cotter, R. J. & Nath, A. Mass spectrometric phosphoproteome analysis of HIV-infected brain reveals novel phosphorylation sites and differential phosphorylation patterns. Proteomics - Clinicial Applications (2016).

15. Drake, J. M. et al. Metastatic castration-resistant prostate cancer reveals intrapatient similarity and interpatient heterogeneity of therapeutic kinase targets. Proceedings of the National Academy of Sciences (2013).

16. Fert-Bober, J., Murray, C. I., Parker, S. J. & Van Eyk, J. E. Precision profiling of the cardiovascular post-translationally modified proteome where there is a will, there is a way. Circulation Research (2018).

17. Herren, A. W. et al. CaMKII phosphorylation of Na V 1.5: Novel in vitro sites identified by mass spectrometry and reduced s516 phosphorylation in human heart failure. Journal of Proteome Research (2015).

18. Ong, S. E. et al. Identifying the proteins to which small-molecule probes and drugs bind in cells. Proceedings of the National Academy of Sciences (2009).

19. Millar, A. Proteomics and the promise of ‘enriching’ clinical trials. Pharmaceutical Technology (2019).

20. Williams, S. A. et al. Improving assessment of drug safety through proteomics: Early detection and mechanistic characterization of the unforeseen harmful effects of torcetrapib. Circulation (2018).

Raeesa Gupte, PhD

Raeesa Gupte, PhD, is a freelance science writer and editor specializing in evidence-based medicine, neurological disorders, and translational diagnostics.