Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 271
Filtrar
1.
Cell ; 184(19): 5031-5052.e26, 2021 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-34534465

RESUMO

Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.


Assuntos
Adenocarcinoma/genética , Carcinoma Ductal Pancreático/genética , Neoplasias Pancreáticas/genética , Proteogenômica , Adenocarcinoma/diagnóstico , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Carcinoma Ductal Pancreático/diagnóstico , Estudos de Coortes , Células Endoteliais/metabolismo , Epigênese Genética , Feminino , Dosagem de Genes , Genoma Humano , Glicólise , Glicoproteínas/biossíntese , Humanos , Masculino , Pessoa de Meia-Idade , Terapia de Alvo Molecular , Neoplasias Pancreáticas/diagnóstico , Fenótipo , Fosfoproteínas/metabolismo , Fosforilação , Prognóstico , Proteínas Quinases/metabolismo , Proteoma/metabolismo , Especificidade por Substrato , Transcriptoma/genética
2.
Cell ; 184(16): 4348-4371.e40, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34358469

RESUMO

Lung squamous cell carcinoma (LSCC) remains a leading cause of cancer death with few therapeutic options. We characterized the proteogenomic landscape of LSCC, providing a deeper exposition of LSCC biology with potential therapeutic implications. We identify NSD3 as an alternative driver in FGFR1-amplified tumors and low-p63 tumors overexpressing the therapeutic target survivin. SOX2 is considered undruggable, but our analyses provide rationale for exploring chromatin modifiers such as LSD1 and EZH2 to target SOX2-overexpressing tumors. Our data support complex regulation of metabolic pathways by crosstalk between post-translational modifications including ubiquitylation. Numerous immune-related proteogenomic observations suggest directions for further investigation. Proteogenomic dissection of CDKN2A mutations argue for more nuanced assessment of RB1 protein expression and phosphorylation before declaring CDK4/6 inhibition unsuccessful. Finally, triangulation between LSCC, LUAD, and HNSCC identified both unique and common therapeutic vulnerabilities. These observations and proteogenomics data resources may guide research into the biology and treatment of LSCC.


Assuntos
Carcinoma de Células Escamosas/genética , Neoplasias Pulmonares/genética , Proteogenômica , Acetilação , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise por Conglomerados , Quinase 4 Dependente de Ciclina/genética , Quinase 6 Dependente de Ciclina/genética , Transição Epitelial-Mesenquimal/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Proteínas de Neoplasias/metabolismo , Fosforilação , Ligação Proteica , Receptores Órfãos Semelhantes a Receptor Tirosina Quinase/metabolismo , Receptores do Fator de Crescimento Derivado de Plaquetas/metabolismo , Transdução de Sinais , Ubiquitinação
3.
Cell ; 182(1): 200-225.e35, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32649874

RESUMO

To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas.


Assuntos
Adenocarcinoma de Pulmão/tratamento farmacológico , Adenocarcinoma de Pulmão/genética , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Proteogenômica , Adenocarcinoma de Pulmão/imunologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/metabolismo , Carcinogênese/genética , Carcinogênese/patologia , Variações do Número de Cópias de DNA/genética , Metilação de DNA/genética , Feminino , Humanos , Neoplasias Pulmonares/imunologia , Masculino , Pessoa de Meia-Idade , Mutação/genética , Proteínas de Fusão Oncogênica , Fenótipo , Fosfoproteínas/metabolismo , Proteoma/metabolismo
4.
Mol Cell Proteomics ; 23(1): 100687, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38029961

RESUMO

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancer types, partly because it is frequently identified at an advanced stage, when surgery is no longer feasible. Therefore, early detection using minimally invasive methods such as blood tests may improve outcomes. However, studies to discover molecular signatures for the early detection of PDAC using blood tests have only been marginally successful. In the current study, a quantitative glycoproteomic approach via data-independent acquisition mass spectrometry was utilized to detect glycoproteins in 29 patient-matched PDAC tissues and sera. A total of 892 N-linked glycopeptides originating from 141 glycoproteins had PDAC-associated changes beyond normal variation. We further evaluated the specificity of these serum-detectable glycoproteins by comparing their abundance in 53 independent PDAC patient sera and 65 cancer-free controls. The PDAC tissue-associated glycoproteins we have identified represent an inventory of serum-detectable PDAC-associated glycoproteins as candidate biomarkers that can be potentially used for the detection of PDAC using blood tests.


Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Biomarcadores Tumorais/metabolismo , Neoplasias Pancreáticas/metabolismo , Carcinoma Ductal Pancreático/metabolismo , Glicoproteínas , Espectrometria de Massas
5.
Proc Natl Acad Sci U S A ; 120(4): e2208275120, 2023 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-36656852

RESUMO

De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.


Assuntos
Dobramento de Proteína , Proteínas , Proteínas/química , Estrutura Secundária de Proteína , Conformação Proteica , Método de Monte Carlo
6.
J Proteome Res ; 23(2): 532-549, 2024 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-38232391

RESUMO

Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.


Assuntos
Anticorpos , Proteoma , Humanos , Proteoma/genética , Proteoma/análise , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos
7.
Clin Proteomics ; 21(1): 7, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291365

RESUMO

BACKGROUND: Omics characterization of pancreatic adenocarcinoma tissue is complicated by the highly heterogeneous and mixed populations of cells. We evaluate the feasibility and potential benefit of using a coring method to enrich specific regions from bulk tissue and then perform proteogenomic analyses. METHODS: We used the Biopsy Trifecta Extraction (BioTExt) technique to isolate cores of epithelial-enriched and stroma-enriched tissue from pancreatic tumor and adjacent tissue blocks. Histology was assessed at multiple depths throughout each core. DNA sequencing, RNA sequencing, and proteomics were performed on the cored and bulk tissue samples. Supervised and unsupervised analyses were performed based on integrated molecular and histology data. RESULTS: Tissue cores had mixed cell composition at varying depths throughout. Average cell type percentages assessed by histology throughout the core were better associated with KRAS variant allele frequencies than standard histology assessment of the cut surface. Clustering based on serial histology data separated the cores into three groups with enrichment of neoplastic epithelium, stroma, and acinar cells, respectively. Using this classification, tumor overexpressed proteins identified in bulk tissue analysis were assigned into epithelial- or stroma-specific categories, which revealed novel epithelial-specific tumor overexpressed proteins. CONCLUSIONS: Our study demonstrates the feasibility of multi-omics data generation from tissue cores, the necessity of interval H&E stains in serial histology sections, and the utility of coring to improve analysis over bulk tissue data.

8.
J Proteome Res ; 22(4): 1024-1042, 2023 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-36318223

RESUMO

The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".


Assuntos
Proteoma , Proteômica , Humanos , Proteoma/genética , Proteoma/análise , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Fases de Leitura Aberta , Proteômica/métodos
9.
PLoS Comput Biol ; 18(9): e1010539, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36112717

RESUMO

Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.


Assuntos
Biologia Computacional , Aprendizado Profundo , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Software
10.
J Biomed Inform ; 139: 104306, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36738870

RESUMO

BACKGROUND: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as ​​reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS: We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS: With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION: In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.


Assuntos
COVID-19 , Registros Eletrônicos de Saúde , Humanos , Coleta de Dados , Registros , Análise por Conglomerados
11.
Mol Cell Proteomics ; 20: 100062, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33640492

RESUMO

We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.


Assuntos
Proteoma , Sociedades Científicas/história , Confiabilidade dos Dados , História do Século XXI , Humanos , Disseminação de Informação
12.
Mol Cell Proteomics ; : 100046, 2021 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-33453411

RESUMO

Recent advances in mass spectrometry (MS)-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with impacts on the health, privacy, and wellbeing of individuals. We conducted and here report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics can ensure that eventual healthcare practices and regulations reflect the considered judgment of the community and anticipate opportunities and problems that may arise as the technology matures.

13.
Mol Cell Proteomics ; 2021 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-33397710

RESUMO

Recent advances in MS-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with the impact on the health, privacy, and well-being of individuals. Here we conducted and report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics is important to ensure that eventual regulations reflect the considered judgment of the community as well as to anticipate opportunities and problems that may arise as the technology matures further.

14.
Proc Natl Acad Sci U S A ; 117(35): 21813-21820, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32817414

RESUMO

Transitions from health to disease are characterized by dysregulation of biological networks under the influence of genetic and environmental factors, often over the course of years to decades before clinical symptoms appear. Understanding these dynamics has important implications for preventive medicine. However, progress has been hindered both by the difficulty of identifying individuals who will eventually go on to develop a particular disease and by the inaccessibility of most disease-relevant tissues in living individuals. Here we developed an alternative approach using polygenic risk scores (PRSs) based on genome-wide association studies (GWAS) for 54 diseases and complex traits coupled with multiomic profiling and found that these PRSs were associated with 766 detectable alterations in proteomic, metabolomic, and standard clinical laboratory measurements (clinical labs) from blood plasma across several thousand mostly healthy individuals. We recapitulated a variety of known relationships (e.g., glutamatergic neurotransmission and inflammation with depression, IL-33 with asthma) and found associations directly suggesting therapeutic strategies (e.g., Ω-6 supplementation and IL-13 inhibition for amyotrophic lateral sclerosis) and influences on longevity (leukemia inhibitory factor, ceramides). Analytes altered in high-genetic-risk individuals showed concordant changes in disease cases, supporting the notion that PRS-associated analytes represent presymptomatic disease alterations. Our results provide insights into the molecular pathophysiology of a range of traits and suggest avenues for the prevention of health-to-disease transitions.


Assuntos
Biomarcadores/sangue , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Doenças Assintomáticas/epidemiologia , Estudos de Coortes , Bases de Dados Genéticas , Progressão da Doença , Testes Genéticos/métodos , Humanos , Metabolômica/métodos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Proteômica/métodos , Fatores de Risco
15.
Proc Natl Acad Sci U S A ; 117(24): 13839-13845, 2020 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-32471946

RESUMO

The Pioneer 100 Wellness Project involved quantitatively profiling 108 participants' molecular physiology over time, including genomes, gut microbiomes, blood metabolomes, blood proteomes, clinical chemistries, and data from wearable devices. Here, we present a longitudinal analysis focused specifically around the Pioneer 100 gut microbiomes. We distinguished a subpopulation of individuals with reduced gut diversity, elevated relative abundance of the genus Prevotella, and reduced levels of the genus Bacteroides We found that the relative abundances of Bacteroides and Prevotella were significantly correlated with certain serum metabolites, including omega-6 fatty acids. Primary dimensions in distance-based redundancy analysis of clinical chemistries explained 18.5% of the variance in bacterial community composition, and revealed a Bacteroides/Prevotella dichotomy aligned with inflammation and dietary markers. Finally, longitudinal analysis of gut microbiome dynamics within individuals showed that direct transitions between Bacteroides-dominated and Prevotella-dominated communities were rare, suggesting the presence of a barrier between these states. One implication is that interventions seeking to transition between Bacteroides- and Prevotella-dominated communities will need to identify permissible paths through ecological state-space that circumvent this apparent barrier.


Assuntos
Bactérias/isolamento & purificação , Microbioma Gastrointestinal , Adulto , Idoso , Bactérias/classificação , Bactérias/genética , Bacteroides/classificação , Bacteroides/genética , Bacteroides/isolamento & purificação , Estudos de Coortes , Fezes/microbiologia , Feminino , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Filogenia , Prevotella/classificação , Prevotella/genética , Prevotella/isolamento & purificação
16.
Bioinformatics ; 37(4): 522-530, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32966552

RESUMO

MOTIVATION: High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS: We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION: IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Adaptação Fisiológica , Processamento Alternativo , Biologia Computacional , Mutação , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
17.
J Biomed Inform ; 134: 104176, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36007785

RESUMO

OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Humanos , Privacidade , Modelos de Riscos Proporcionais , Análise de Sobrevida
18.
J Med Internet Res ; 24(5): e37931, 2022 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-35476727

RESUMO

BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE: The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS: From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS: EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS: A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiologia , Registros Eletrônicos de Saúde , Hospitalização , Humanos , Estudos Retrospectivos
19.
J Proteome Res ; 20(2): 1178-1189, 2021 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-33393786

RESUMO

When the JCVI-syn3.0 genome was designed and implemented in 2016 as the minimal genome of a free-living organism, approximately one-third of the 438 protein-coding genes had no known function. Subsequent refinement into JCVI-syn3A led to inclusion of 16 additional protein-coding genes, including several unknown functions, resulting in an improved growth phenotype. Here, we seek to unveil the biological roles and protein-protein interaction (PPI) networks for these poorly characterized proteins using state-of-the-art deep learning contact-assisted structure prediction, followed by structure-based annotation of functions and PPI predictions. Our pipeline is able to confidently assign functions for many previously unannotated proteins such as putative vitamin transporters, which suggest the importance of nutrient uptake even in a minimized genome. Remarkably, despite the artificial selection of genes in the minimal syn3 genome, our reconstructed PPI network still shows a power law distribution of node degrees typical of naturally evolved bacterial PPI networks. Making use of our framework for combined structure/function/interaction modeling, we are able to identify both fundamental aspects of network biology that are retained in a minimal proteome and additional essential functions not yet recognized among the poorly annotated components of the syn3.0 and syn3A proteomes.


Assuntos
Genes Essenciais , Mapas de Interação de Proteínas , Biologia Computacional , Proteoma/genética
20.
J Proteome Res ; 20(12): 5241-5263, 2021 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-34672606

RESUMO

The study of proteins circulating in blood offers tremendous opportunities to diagnose, stratify, or possibly prevent diseases. With recent technological advances and the urgent need to understand the effects of COVID-19, the proteomic analysis of blood-derived serum and plasma has become even more important for studying human biology and pathophysiology. Here we provide views and perspectives about technological developments and possible clinical applications that use mass-spectrometry(MS)- or affinity-based methods. We discuss examples where plasma proteomics contributed valuable insights into SARS-CoV-2 infections, aging, and hemostasis and the opportunities offered by combining proteomics with genetic data. As a contribution to the Human Proteome Organization (HUPO) Human Plasma Proteome Project (HPPP), we present the Human Plasma PeptideAtlas build 2021-07 that comprises 4395 canonical and 1482 additional nonredundant human proteins detected in 240 MS-based experiments. In addition, we report the new Human Extracellular Vesicle PeptideAtlas 2021-06, which comprises five studies and 2757 canonical proteins detected in extracellular vesicles circulating in blood, of which 74% (2047) are in common with the plasma PeptideAtlas. Our overview summarizes the recent advances, impactful applications, and ongoing challenges for translating plasma proteomics into utility for precision medicine.


Assuntos
Proteoma , Proteômica/tendências , Envelhecimento/genética , COVID-19/genética , Bases de Dados de Proteínas , Hemostasia/genética , Humanos , Espectrometria de Massas , Proteoma/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA