Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 263
Filter
1.
Cell ; 184(19): 5031-5052.e26, 2021 09 16.
Article in English | MEDLINE | ID: mdl-34534465

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.


Subject(s)
Adenocarcinoma/genetics , Carcinoma, Pancreatic Ductal/genetics , Pancreatic Neoplasms/genetics , Proteogenomics , Adenocarcinoma/diagnosis , Adult , Aged , Aged, 80 and over , Algorithms , Carcinoma, Pancreatic Ductal/diagnosis , Cohort Studies , Endothelial Cells/metabolism , Epigenesis, Genetic , Female , Gene Dosage , Genome, Human , Glycolysis , Glycoproteins/biosynthesis , Humans , Male , Middle Aged , Molecular Targeted Therapy , Pancreatic Neoplasms/diagnosis , Phenotype , Phosphoproteins/metabolism , Phosphorylation , Prognosis , Protein Kinases/metabolism , Proteome/metabolism , Substrate Specificity , Transcriptome/genetics
2.
Cell ; 184(16): 4348-4371.e40, 2021 08 05.
Article in English | MEDLINE | ID: mdl-34358469

ABSTRACT

Lung squamous cell carcinoma (LSCC) remains a leading cause of cancer death with few therapeutic options. We characterized the proteogenomic landscape of LSCC, providing a deeper exposition of LSCC biology with potential therapeutic implications. We identify NSD3 as an alternative driver in FGFR1-amplified tumors and low-p63 tumors overexpressing the therapeutic target survivin. SOX2 is considered undruggable, but our analyses provide rationale for exploring chromatin modifiers such as LSD1 and EZH2 to target SOX2-overexpressing tumors. Our data support complex regulation of metabolic pathways by crosstalk between post-translational modifications including ubiquitylation. Numerous immune-related proteogenomic observations suggest directions for further investigation. Proteogenomic dissection of CDKN2A mutations argue for more nuanced assessment of RB1 protein expression and phosphorylation before declaring CDK4/6 inhibition unsuccessful. Finally, triangulation between LSCC, LUAD, and HNSCC identified both unique and common therapeutic vulnerabilities. These observations and proteogenomics data resources may guide research into the biology and treatment of LSCC.


Subject(s)
Carcinoma, Squamous Cell/genetics , Lung Neoplasms/genetics , Proteogenomics , Acetylation , Adult , Aged , Aged, 80 and over , Cluster Analysis , Cyclin-Dependent Kinase 4/genetics , Cyclin-Dependent Kinase 6/genetics , Epithelial-Mesenchymal Transition/genetics , Female , Gene Expression Regulation, Neoplastic , Humans , Male , Middle Aged , Mutation/genetics , Neoplasm Proteins/metabolism , Phosphorylation , Protein Binding , Receptor Tyrosine Kinase-like Orphan Receptors/metabolism , Receptors, Platelet-Derived Growth Factor/metabolism , Signal Transduction , Ubiquitination
3.
Cell ; 182(1): 200-225.e35, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32649874

ABSTRACT

To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas.


Subject(s)
Adenocarcinoma of Lung/drug therapy , Adenocarcinoma of Lung/genetics , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Proteogenomics , Adenocarcinoma of Lung/immunology , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/metabolism , Carcinogenesis/genetics , Carcinogenesis/pathology , DNA Copy Number Variations/genetics , DNA Methylation/genetics , Female , Humans , Lung Neoplasms/immunology , Male , Middle Aged , Mutation/genetics , Oncogene Proteins, Fusion , Phenotype , Phosphoproteins/metabolism , Proteome/metabolism
4.
Mol Cell Proteomics ; 23(1): 100687, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38029961

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancer types, partly because it is frequently identified at an advanced stage, when surgery is no longer feasible. Therefore, early detection using minimally invasive methods such as blood tests may improve outcomes. However, studies to discover molecular signatures for the early detection of PDAC using blood tests have only been marginally successful. In the current study, a quantitative glycoproteomic approach via data-independent acquisition mass spectrometry was utilized to detect glycoproteins in 29 patient-matched PDAC tissues and sera. A total of 892 N-linked glycopeptides originating from 141 glycoproteins had PDAC-associated changes beyond normal variation. We further evaluated the specificity of these serum-detectable glycoproteins by comparing their abundance in 53 independent PDAC patient sera and 65 cancer-free controls. The PDAC tissue-associated glycoproteins we have identified represent an inventory of serum-detectable PDAC-associated glycoproteins as candidate biomarkers that can be potentially used for the detection of PDAC using blood tests.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Biomarkers, Tumor/metabolism , Pancreatic Neoplasms/metabolism , Carcinoma, Pancreatic Ductal/metabolism , Glycoproteins , Mass Spectrometry
5.
Proc Natl Acad Sci U S A ; 120(4): e2208275120, 2023 Jan 24.
Article in English | MEDLINE | ID: mdl-36656852

ABSTRACT

De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.


Subject(s)
Protein Folding , Proteins , Proteins/chemistry , Protein Structure, Secondary , Protein Conformation , Monte Carlo Method
6.
J Proteome Res ; 23(2): 532-549, 2024 02 02.
Article in English | MEDLINE | ID: mdl-38232391

ABSTRACT

Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.


Subject(s)
Antibodies , Proteome , Humans , Proteome/genetics , Proteome/analysis , Databases, Protein , Mass Spectrometry/methods , Proteomics/methods
7.
Clin Proteomics ; 21(1): 7, 2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38291365

ABSTRACT

BACKGROUND: Omics characterization of pancreatic adenocarcinoma tissue is complicated by the highly heterogeneous and mixed populations of cells. We evaluate the feasibility and potential benefit of using a coring method to enrich specific regions from bulk tissue and then perform proteogenomic analyses. METHODS: We used the Biopsy Trifecta Extraction (BioTExt) technique to isolate cores of epithelial-enriched and stroma-enriched tissue from pancreatic tumor and adjacent tissue blocks. Histology was assessed at multiple depths throughout each core. DNA sequencing, RNA sequencing, and proteomics were performed on the cored and bulk tissue samples. Supervised and unsupervised analyses were performed based on integrated molecular and histology data. RESULTS: Tissue cores had mixed cell composition at varying depths throughout. Average cell type percentages assessed by histology throughout the core were better associated with KRAS variant allele frequencies than standard histology assessment of the cut surface. Clustering based on serial histology data separated the cores into three groups with enrichment of neoplastic epithelium, stroma, and acinar cells, respectively. Using this classification, tumor overexpressed proteins identified in bulk tissue analysis were assigned into epithelial- or stroma-specific categories, which revealed novel epithelial-specific tumor overexpressed proteins. CONCLUSIONS: Our study demonstrates the feasibility of multi-omics data generation from tissue cores, the necessity of interval H&E stains in serial histology sections, and the utility of coring to improve analysis over bulk tissue data.

8.
J Proteome Res ; 22(4): 1024-1042, 2023 04 07.
Article in English | MEDLINE | ID: mdl-36318223

ABSTRACT

The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".


Subject(s)
Proteome , Proteomics , Humans , Proteome/genetics , Proteome/analysis , Databases, Protein , Mass Spectrometry/methods , Open Reading Frames , Proteomics/methods
9.
PLoS Comput Biol ; 18(9): e1010539, 2022 09.
Article in English | MEDLINE | ID: mdl-36112717

ABSTRACT

Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.


Subject(s)
Computational Biology , Deep Learning , Algorithms , Computational Biology/methods , Databases, Protein , Models, Molecular , Protein Conformation , Protein Folding , Proteins/chemistry , Software
10.
J Biomed Inform ; 139: 104306, 2023 03.
Article in English | MEDLINE | ID: mdl-36738870

ABSTRACT

BACKGROUND: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as ​​reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS: We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS: With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION: In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.


Subject(s)
COVID-19 , Electronic Health Records , Humans , Data Collection , Records , Cluster Analysis
11.
Mol Cell Proteomics ; 20: 100062, 2021.
Article in English | MEDLINE | ID: mdl-33640492

ABSTRACT

We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.


Subject(s)
Proteome , Societies, Scientific/history , Data Accuracy , History, 21st Century , Humans , Information Dissemination
12.
Mol Cell Proteomics ; : 100046, 2021 Jan 14.
Article in English | MEDLINE | ID: mdl-33453411

ABSTRACT

Recent advances in mass spectrometry (MS)-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with impacts on the health, privacy, and wellbeing of individuals. We conducted and here report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics can ensure that eventual healthcare practices and regulations reflect the considered judgment of the community and anticipate opportunities and problems that may arise as the technology matures.

13.
Mol Cell Proteomics ; 2021 Jan 04.
Article in English | MEDLINE | ID: mdl-33397710

ABSTRACT

Recent advances in MS-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with the impact on the health, privacy, and well-being of individuals. Here we conducted and report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics is important to ensure that eventual regulations reflect the considered judgment of the community as well as to anticipate opportunities and problems that may arise as the technology matures further.

14.
Proc Natl Acad Sci U S A ; 117(35): 21813-21820, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32817414

ABSTRACT

Transitions from health to disease are characterized by dysregulation of biological networks under the influence of genetic and environmental factors, often over the course of years to decades before clinical symptoms appear. Understanding these dynamics has important implications for preventive medicine. However, progress has been hindered both by the difficulty of identifying individuals who will eventually go on to develop a particular disease and by the inaccessibility of most disease-relevant tissues in living individuals. Here we developed an alternative approach using polygenic risk scores (PRSs) based on genome-wide association studies (GWAS) for 54 diseases and complex traits coupled with multiomic profiling and found that these PRSs were associated with 766 detectable alterations in proteomic, metabolomic, and standard clinical laboratory measurements (clinical labs) from blood plasma across several thousand mostly healthy individuals. We recapitulated a variety of known relationships (e.g., glutamatergic neurotransmission and inflammation with depression, IL-33 with asthma) and found associations directly suggesting therapeutic strategies (e.g., Ω-6 supplementation and IL-13 inhibition for amyotrophic lateral sclerosis) and influences on longevity (leukemia inhibitory factor, ceramides). Analytes altered in high-genetic-risk individuals showed concordant changes in disease cases, supporting the notion that PRS-associated analytes represent presymptomatic disease alterations. Our results provide insights into the molecular pathophysiology of a range of traits and suggest avenues for the prevention of health-to-disease transitions.


Subject(s)
Biomarkers/blood , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Asymptomatic Diseases/epidemiology , Cohort Studies , Databases, Genetic , Disease Progression , Genetic Testing/methods , Humans , Metabolomics/methods , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics , Proteomics/methods , Risk Factors
15.
Proc Natl Acad Sci U S A ; 117(24): 13839-13845, 2020 06 16.
Article in English | MEDLINE | ID: mdl-32471946

ABSTRACT

The Pioneer 100 Wellness Project involved quantitatively profiling 108 participants' molecular physiology over time, including genomes, gut microbiomes, blood metabolomes, blood proteomes, clinical chemistries, and data from wearable devices. Here, we present a longitudinal analysis focused specifically around the Pioneer 100 gut microbiomes. We distinguished a subpopulation of individuals with reduced gut diversity, elevated relative abundance of the genus Prevotella, and reduced levels of the genus Bacteroides We found that the relative abundances of Bacteroides and Prevotella were significantly correlated with certain serum metabolites, including omega-6 fatty acids. Primary dimensions in distance-based redundancy analysis of clinical chemistries explained 18.5% of the variance in bacterial community composition, and revealed a Bacteroides/Prevotella dichotomy aligned with inflammation and dietary markers. Finally, longitudinal analysis of gut microbiome dynamics within individuals showed that direct transitions between Bacteroides-dominated and Prevotella-dominated communities were rare, suggesting the presence of a barrier between these states. One implication is that interventions seeking to transition between Bacteroides- and Prevotella-dominated communities will need to identify permissible paths through ecological state-space that circumvent this apparent barrier.


Subject(s)
Bacteria/isolation & purification , Gastrointestinal Microbiome , Adult , Aged , Bacteria/classification , Bacteria/genetics , Bacteroides/classification , Bacteroides/genetics , Bacteroides/isolation & purification , Cohort Studies , Feces/microbiology , Female , Humans , Longitudinal Studies , Male , Middle Aged , Phylogeny , Prevotella/classification , Prevotella/genetics , Prevotella/isolation & purification
16.
Bioinformatics ; 37(4): 522-530, 2021 05 01.
Article in English | MEDLINE | ID: mdl-32966552

ABSTRACT

MOTIVATION: High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS: We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION: IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Adaptation, Physiological , Alternative Splicing , Computational Biology , Mutation , Protein Isoforms/genetics , Protein Isoforms/metabolism
17.
J Biomed Inform ; 134: 104176, 2022 10.
Article in English | MEDLINE | ID: mdl-36007785

ABSTRACT

OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.


Subject(s)
Algorithms , Electronic Health Records , Humans , Privacy , Proportional Hazards Models , Survival Analysis
18.
J Med Internet Res ; 24(5): e37931, 2022 05 18.
Article in English | MEDLINE | ID: mdl-35476727

ABSTRACT

BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE: The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS: From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS: EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS: A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnosis , COVID-19/epidemiology , Electronic Health Records , Hospitalization , Humans , Retrospective Studies
19.
J Proteome Res ; 20(2): 1178-1189, 2021 02 05.
Article in English | MEDLINE | ID: mdl-33393786

ABSTRACT

When the JCVI-syn3.0 genome was designed and implemented in 2016 as the minimal genome of a free-living organism, approximately one-third of the 438 protein-coding genes had no known function. Subsequent refinement into JCVI-syn3A led to inclusion of 16 additional protein-coding genes, including several unknown functions, resulting in an improved growth phenotype. Here, we seek to unveil the biological roles and protein-protein interaction (PPI) networks for these poorly characterized proteins using state-of-the-art deep learning contact-assisted structure prediction, followed by structure-based annotation of functions and PPI predictions. Our pipeline is able to confidently assign functions for many previously unannotated proteins such as putative vitamin transporters, which suggest the importance of nutrient uptake even in a minimized genome. Remarkably, despite the artificial selection of genes in the minimal syn3 genome, our reconstructed PPI network still shows a power law distribution of node degrees typical of naturally evolved bacterial PPI networks. Making use of our framework for combined structure/function/interaction modeling, we are able to identify both fundamental aspects of network biology that are retained in a minimal proteome and additional essential functions not yet recognized among the poorly annotated components of the syn3.0 and syn3A proteomes.


Subject(s)
Genes, Essential , Protein Interaction Maps , Computational Biology , Proteome/genetics
20.
J Proteome Res ; 20(12): 5241-5263, 2021 12 03.
Article in English | MEDLINE | ID: mdl-34672606

ABSTRACT

The study of proteins circulating in blood offers tremendous opportunities to diagnose, stratify, or possibly prevent diseases. With recent technological advances and the urgent need to understand the effects of COVID-19, the proteomic analysis of blood-derived serum and plasma has become even more important for studying human biology and pathophysiology. Here we provide views and perspectives about technological developments and possible clinical applications that use mass-spectrometry(MS)- or affinity-based methods. We discuss examples where plasma proteomics contributed valuable insights into SARS-CoV-2 infections, aging, and hemostasis and the opportunities offered by combining proteomics with genetic data. As a contribution to the Human Proteome Organization (HUPO) Human Plasma Proteome Project (HPPP), we present the Human Plasma PeptideAtlas build 2021-07 that comprises 4395 canonical and 1482 additional nonredundant human proteins detected in 240 MS-based experiments. In addition, we report the new Human Extracellular Vesicle PeptideAtlas 2021-06, which comprises five studies and 2757 canonical proteins detected in extracellular vesicles circulating in blood, of which 74% (2047) are in common with the plasma PeptideAtlas. Our overview summarizes the recent advances, impactful applications, and ongoing challenges for translating plasma proteomics into utility for precision medicine.


Subject(s)
Proteome , Proteomics/trends , Aging/genetics , COVID-19/genetics , Databases, Protein , Hemostasis/genetics , Humans , Mass Spectrometry , Proteome/genetics
SELECTION OF CITATIONS
SEARCH DETAIL