Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 264
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 184(19): 5031-5052.e26, 2021 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-34534465

RESUMEN

Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.


Asunto(s)
Adenocarcinoma/genética , Carcinoma Ductal Pancreático/genética , Neoplasias Pancreáticas/genética , Proteogenómica , Adenocarcinoma/diagnóstico , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Carcinoma Ductal Pancreático/diagnóstico , Estudios de Cohortes , Células Endoteliales/metabolismo , Epigénesis Genética , Femenino , Dosificación de Gen , Genoma Humano , Glucólisis , Glicoproteínas/biosíntesis , Humanos , Masculino , Persona de Mediana Edad , Terapia Molecular Dirigida , Neoplasias Pancreáticas/diagnóstico , Fenotipo , Fosfoproteínas/metabolismo , Fosforilación , Pronóstico , Proteínas Quinasas/metabolismo , Proteoma/metabolismo , Especificidad por Sustrato , Transcriptoma/genética
2.
Cell ; 184(16): 4348-4371.e40, 2021 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-34358469

RESUMEN

Lung squamous cell carcinoma (LSCC) remains a leading cause of cancer death with few therapeutic options. We characterized the proteogenomic landscape of LSCC, providing a deeper exposition of LSCC biology with potential therapeutic implications. We identify NSD3 as an alternative driver in FGFR1-amplified tumors and low-p63 tumors overexpressing the therapeutic target survivin. SOX2 is considered undruggable, but our analyses provide rationale for exploring chromatin modifiers such as LSD1 and EZH2 to target SOX2-overexpressing tumors. Our data support complex regulation of metabolic pathways by crosstalk between post-translational modifications including ubiquitylation. Numerous immune-related proteogenomic observations suggest directions for further investigation. Proteogenomic dissection of CDKN2A mutations argue for more nuanced assessment of RB1 protein expression and phosphorylation before declaring CDK4/6 inhibition unsuccessful. Finally, triangulation between LSCC, LUAD, and HNSCC identified both unique and common therapeutic vulnerabilities. These observations and proteogenomics data resources may guide research into the biology and treatment of LSCC.


Asunto(s)
Carcinoma de Células Escamosas/genética , Neoplasias Pulmonares/genética , Proteogenómica , Acetilación , Adulto , Anciano , Anciano de 80 o más Años , Análisis por Conglomerados , Quinasa 4 Dependiente de la Ciclina/genética , Quinasa 6 Dependiente de la Ciclina/genética , Transición Epitelial-Mesenquimal/genética , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Mutación/genética , Proteínas de Neoplasias/metabolismo , Fosforilación , Unión Proteica , Receptores Huérfanos Similares al Receptor Tirosina Quinasa/metabolismo , Receptores del Factor de Crecimiento Derivado de Plaquetas/metabolismo , Transducción de Señal , Ubiquitinación
3.
Cell ; 182(1): 200-225.e35, 2020 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-32649874

RESUMEN

To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas.


Asunto(s)
Adenocarcinoma del Pulmón/tratamiento farmacológico , Adenocarcinoma del Pulmón/genética , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/genética , Proteogenómica , Adenocarcinoma del Pulmón/inmunología , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/metabolismo , Carcinogénesis/genética , Carcinogénesis/patología , Variaciones en el Número de Copia de ADN/genética , Metilación de ADN/genética , Femenino , Humanos , Neoplasias Pulmonares/inmunología , Masculino , Persona de Mediana Edad , Mutación/genética , Proteínas de Fusión Oncogénica , Fenotipo , Fosfoproteínas/metabolismo , Proteoma/metabolismo
4.
Mol Cell Proteomics ; 23(1): 100687, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38029961

RESUMEN

Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancer types, partly because it is frequently identified at an advanced stage, when surgery is no longer feasible. Therefore, early detection using minimally invasive methods such as blood tests may improve outcomes. However, studies to discover molecular signatures for the early detection of PDAC using blood tests have only been marginally successful. In the current study, a quantitative glycoproteomic approach via data-independent acquisition mass spectrometry was utilized to detect glycoproteins in 29 patient-matched PDAC tissues and sera. A total of 892 N-linked glycopeptides originating from 141 glycoproteins had PDAC-associated changes beyond normal variation. We further evaluated the specificity of these serum-detectable glycoproteins by comparing their abundance in 53 independent PDAC patient sera and 65 cancer-free controls. The PDAC tissue-associated glycoproteins we have identified represent an inventory of serum-detectable PDAC-associated glycoproteins as candidate biomarkers that can be potentially used for the detection of PDAC using blood tests.


Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Biomarcadores de Tumor/metabolismo , Neoplasias Pancreáticas/metabolismo , Carcinoma Ductal Pancreático/metabolismo , Glicoproteínas , Espectrometría de Masas
5.
Proc Natl Acad Sci U S A ; 120(4): e2208275120, 2023 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-36656852

RESUMEN

De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.


Asunto(s)
Pliegue de Proteína , Proteínas , Proteínas/química , Estructura Secundaria de Proteína , Conformación Proteica , Método de Montecarlo
6.
J Proteome Res ; 23(2): 532-549, 2024 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-38232391

RESUMEN

Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.


Asunto(s)
Anticuerpos , Proteoma , Humanos , Proteoma/genética , Proteoma/análisis , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Proteómica/métodos
7.
Clin Proteomics ; 21(1): 7, 2024 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-38291365

RESUMEN

BACKGROUND: Omics characterization of pancreatic adenocarcinoma tissue is complicated by the highly heterogeneous and mixed populations of cells. We evaluate the feasibility and potential benefit of using a coring method to enrich specific regions from bulk tissue and then perform proteogenomic analyses. METHODS: We used the Biopsy Trifecta Extraction (BioTExt) technique to isolate cores of epithelial-enriched and stroma-enriched tissue from pancreatic tumor and adjacent tissue blocks. Histology was assessed at multiple depths throughout each core. DNA sequencing, RNA sequencing, and proteomics were performed on the cored and bulk tissue samples. Supervised and unsupervised analyses were performed based on integrated molecular and histology data. RESULTS: Tissue cores had mixed cell composition at varying depths throughout. Average cell type percentages assessed by histology throughout the core were better associated with KRAS variant allele frequencies than standard histology assessment of the cut surface. Clustering based on serial histology data separated the cores into three groups with enrichment of neoplastic epithelium, stroma, and acinar cells, respectively. Using this classification, tumor overexpressed proteins identified in bulk tissue analysis were assigned into epithelial- or stroma-specific categories, which revealed novel epithelial-specific tumor overexpressed proteins. CONCLUSIONS: Our study demonstrates the feasibility of multi-omics data generation from tissue cores, the necessity of interval H&E stains in serial histology sections, and the utility of coring to improve analysis over bulk tissue data.

8.
J Proteome Res ; 22(4): 1024-1042, 2023 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-36318223

RESUMEN

The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/genética , Proteoma/análisis , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Sistemas de Lectura Abierta , Proteómica/métodos
9.
PLoS Comput Biol ; 18(9): e1010539, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36112717

RESUMEN

Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Algoritmos , Biología Computacional/métodos , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Programas Informáticos
10.
J Biomed Inform ; 139: 104306, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36738870

RESUMEN

BACKGROUND: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as ​​reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS: We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS: With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION: In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Humanos , Recolección de Datos , Registros , Análisis por Conglomerados
11.
Mol Cell Proteomics ; 20: 100062, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33640492

RESUMEN

We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.


Asunto(s)
Proteoma , Sociedades Científicas/historia , Exactitud de los Datos , Historia del Siglo XXI , Humanos , Difusión de la Información
12.
Mol Cell Proteomics ; : 100046, 2021 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-33453411

RESUMEN

Recent advances in mass spectrometry (MS)-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with impacts on the health, privacy, and wellbeing of individuals. We conducted and here report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics can ensure that eventual healthcare practices and regulations reflect the considered judgment of the community and anticipate opportunities and problems that may arise as the technology matures.

13.
Mol Cell Proteomics ; 2021 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-33397710

RESUMEN

Recent advances in MS-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with the impact on the health, privacy, and well-being of individuals. Here we conducted and report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics is important to ensure that eventual regulations reflect the considered judgment of the community as well as to anticipate opportunities and problems that may arise as the technology matures further.

14.
Proc Natl Acad Sci U S A ; 117(35): 21813-21820, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32817414

RESUMEN

Transitions from health to disease are characterized by dysregulation of biological networks under the influence of genetic and environmental factors, often over the course of years to decades before clinical symptoms appear. Understanding these dynamics has important implications for preventive medicine. However, progress has been hindered both by the difficulty of identifying individuals who will eventually go on to develop a particular disease and by the inaccessibility of most disease-relevant tissues in living individuals. Here we developed an alternative approach using polygenic risk scores (PRSs) based on genome-wide association studies (GWAS) for 54 diseases and complex traits coupled with multiomic profiling and found that these PRSs were associated with 766 detectable alterations in proteomic, metabolomic, and standard clinical laboratory measurements (clinical labs) from blood plasma across several thousand mostly healthy individuals. We recapitulated a variety of known relationships (e.g., glutamatergic neurotransmission and inflammation with depression, IL-33 with asthma) and found associations directly suggesting therapeutic strategies (e.g., Ω-6 supplementation and IL-13 inhibition for amyotrophic lateral sclerosis) and influences on longevity (leukemia inhibitory factor, ceramides). Analytes altered in high-genetic-risk individuals showed concordant changes in disease cases, supporting the notion that PRS-associated analytes represent presymptomatic disease alterations. Our results provide insights into the molecular pathophysiology of a range of traits and suggest avenues for the prevention of health-to-disease transitions.


Asunto(s)
Biomarcadores/sangre , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Enfermedades Asintomáticas/epidemiología , Estudios de Cohortes , Bases de Datos Genéticas , Progresión de la Enfermedad , Pruebas Genéticas/métodos , Humanos , Metabolómica/métodos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Proteómica/métodos , Factores de Riesgo
15.
Proc Natl Acad Sci U S A ; 117(24): 13839-13845, 2020 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-32471946

RESUMEN

The Pioneer 100 Wellness Project involved quantitatively profiling 108 participants' molecular physiology over time, including genomes, gut microbiomes, blood metabolomes, blood proteomes, clinical chemistries, and data from wearable devices. Here, we present a longitudinal analysis focused specifically around the Pioneer 100 gut microbiomes. We distinguished a subpopulation of individuals with reduced gut diversity, elevated relative abundance of the genus Prevotella, and reduced levels of the genus Bacteroides We found that the relative abundances of Bacteroides and Prevotella were significantly correlated with certain serum metabolites, including omega-6 fatty acids. Primary dimensions in distance-based redundancy analysis of clinical chemistries explained 18.5% of the variance in bacterial community composition, and revealed a Bacteroides/Prevotella dichotomy aligned with inflammation and dietary markers. Finally, longitudinal analysis of gut microbiome dynamics within individuals showed that direct transitions between Bacteroides-dominated and Prevotella-dominated communities were rare, suggesting the presence of a barrier between these states. One implication is that interventions seeking to transition between Bacteroides- and Prevotella-dominated communities will need to identify permissible paths through ecological state-space that circumvent this apparent barrier.


Asunto(s)
Bacterias/aislamiento & purificación , Microbioma Gastrointestinal , Adulto , Anciano , Bacterias/clasificación , Bacterias/genética , Bacteroides/clasificación , Bacteroides/genética , Bacteroides/aislamiento & purificación , Estudios de Cohortes , Heces/microbiología , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Filogenia , Prevotella/clasificación , Prevotella/genética , Prevotella/aislamiento & purificación
16.
Bioinformatics ; 37(4): 522-530, 2021 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32966552

RESUMEN

MOTIVATION: High resolution annotation of gene functions is a central goal in functional genomics. A single gene may produce multiple isoforms with different functions through alternative splicing. Conventional approaches, however, consider a gene as a single entity without differentiating these functionally different isoforms. Towards understanding gene functions at higher resolution, recent efforts have focused on predicting the functions of isoforms. However, the performance of existing methods is far from satisfactory mainly because of the lack of isoform-level functional annotation. RESULTS: We present IsoResolve, a novel approach for isoform function prediction, which leverages the information from gene function prediction models with domain adaptation (DA). IsoResolve treats gene-level and isoform-level features as source and target domains, respectively. It uses DA to project the two domains into a latent variable space in such a way that the latent variables from the two domains have similar distribution, which enables the gene domain information to be leveraged for isoform function prediction. We systematically evaluated the performance of IsoResolve in predicting functions. Compared with five state-of-the-art methods, IsoResolve achieved significantly better performance. IsoResolve was further validated by case studies of genes with isoform-level functional annotation. AVAILABILITY AND IMPLEMENTATION: IsoResolve is freely available at https://github.com/genemine/IsoResolve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Adaptación Fisiológica , Empalme Alternativo , Biología Computacional , Mutación , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
17.
J Biomed Inform ; 134: 104176, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36007785

RESUMEN

OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Humanos , Privacidad , Modelos de Riesgos Proporcionales , Análisis de Supervivencia
18.
J Med Internet Res ; 24(5): e37931, 2022 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-35476727

RESUMEN

BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE: The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS: From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS: EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS: A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiología , Registros Electrónicos de Salud , Hospitalización , Humanos , Estudios Retrospectivos
19.
J Proteome Res ; 20(2): 1178-1189, 2021 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-33393786

RESUMEN

When the JCVI-syn3.0 genome was designed and implemented in 2016 as the minimal genome of a free-living organism, approximately one-third of the 438 protein-coding genes had no known function. Subsequent refinement into JCVI-syn3A led to inclusion of 16 additional protein-coding genes, including several unknown functions, resulting in an improved growth phenotype. Here, we seek to unveil the biological roles and protein-protein interaction (PPI) networks for these poorly characterized proteins using state-of-the-art deep learning contact-assisted structure prediction, followed by structure-based annotation of functions and PPI predictions. Our pipeline is able to confidently assign functions for many previously unannotated proteins such as putative vitamin transporters, which suggest the importance of nutrient uptake even in a minimized genome. Remarkably, despite the artificial selection of genes in the minimal syn3 genome, our reconstructed PPI network still shows a power law distribution of node degrees typical of naturally evolved bacterial PPI networks. Making use of our framework for combined structure/function/interaction modeling, we are able to identify both fundamental aspects of network biology that are retained in a minimal proteome and additional essential functions not yet recognized among the poorly annotated components of the syn3.0 and syn3A proteomes.


Asunto(s)
Genes Esenciales , Mapas de Interacción de Proteínas , Biología Computacional , Proteoma/genética
20.
J Proteome Res ; 20(12): 5227-5240, 2021 12 03.
Artículo en Inglés | MEDLINE | ID: mdl-34670092

RESUMEN

The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 357 (92.8%) of the 19 778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.


Asunto(s)
Benchmarking , Proteoma , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas/métodos , Proteoma/análisis , Proteoma/genética , Proteómica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA