Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Med Internet Res ; 23(11): e32900, 2021 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-34842542

RESUMEN

BACKGROUND: Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: existing multimorbidity scores (1) are generally limited to one data group (eg, diagnoses, labs) and may be missing vital information, (2) are usually limited to specific demographic groups (eg, age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention. OBJECTIVE: Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHRs), we developed a physiologically diverse and generalizable set of multimorbidity risk scores. METHODS: Using EHR data from a nationwide cohort of patients, we developed the total health profile, a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient hospital visitation over a 2-year follow-up window, attributable to specific organ systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for heart, lung, neuro, kidney, and digestive functions and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients. RESULTS: Study patients closely matched national census averages, with a median age of 41 years, a median income of $66,829, and racial averages by zip code of 73.8% White, 5.9% Asian, and 11.9% African American. All models were well calibrated and demonstrated strong performance with areas under the receiver operating curve (AUROCs) of 0.83 for the total health score (THS), 0.89 for heart, 0.86 for lung, 0.84 for neuro, 0.90 for kidney, and 0.83 for digestive functions. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip code income levels. Each model learned to generate predictions by focusing on appropriate clinically relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly used multimorbidity scoring systems, specifically the Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) overall (AUROCs: THS=0.823, CCI=0.735, ECI=0.649) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower-income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups. CONCLUSIONS: Massive retrospective EHR data sets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalizable across diverse patient populations.


Asunto(s)
Aprendizaje Automático , Multimorbilidad , Adulto , Estudios de Cohortes , Registros Electrónicos de Salud , Humanos , Persona de Mediana Edad , Estudios Retrospectivos , Factores de Riesgo
2.
Proc Natl Acad Sci U S A ; 115(14): 3686-3691, 2018 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-29555771

RESUMEN

Reducing premature mortality associated with age-related chronic diseases, such as cancer and cardiovascular disease, is an urgent priority. We report early results using genomics in combination with advanced imaging and other clinical testing to proactively screen for age-related chronic disease risk among adults. We enrolled active, symptom-free adults in a study of screening for age-related chronic diseases associated with premature mortality. In addition to personal and family medical history and other clinical testing, we obtained whole-genome sequencing (WGS), noncontrast whole-body MRI, dual-energy X-ray absorptiometry (DXA), global metabolomics, a new blood test for prediabetes (Quantose IR), echocardiography (ECHO), ECG, and cardiac rhythm monitoring to identify age-related chronic disease risks. Precision medicine screening using WGS and advanced imaging along with other testing among active, symptom-free adults identified a broad set of complementary age-related chronic disease risks associated with premature mortality and strengthened WGS variant interpretation. This and other similarly designed screening approaches anchored by WGS and advanced imaging may have the potential to extend healthy life among active adults through improved prevention and early detection of age-related chronic diseases (and their risk factors) associated with premature mortality.


Asunto(s)
Enfermedad/genética , Predisposición Genética a la Enfermedad , Procesamiento de Imagen Asistido por Computador/métodos , Mutación , Medicina de Precisión/métodos , Secuenciación Completa del Genoma/métodos , Adulto , Anciano , Anciano de 80 o más Años , Enfermedades Cardiovasculares/diagnóstico por imagen , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/patología , Enfermedad/clasificación , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Persona de Mediana Edad , Neoplasias/diagnóstico por imagen , Neoplasias/genética , Neoplasias/patología , Enfermedades del Sistema Nervioso/diagnóstico por imagen , Enfermedades del Sistema Nervioso/genética , Enfermedades del Sistema Nervioso/patología , Medición de Riesgo , Análisis de Secuencia de ARN , Adulto Joven
3.
Proc Natl Acad Sci U S A ; 114(38): 10166-10171, 2017 09 19.
Artículo en Inglés | MEDLINE | ID: mdl-28874526

RESUMEN

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.


Asunto(s)
Confidencialidad , Dermatoglifia del ADN , Modelos Genéticos , Fenotipo , Secuenciación Completa del Genoma , Adulto , Factores de Edad , Algoritmos , Tamaño Corporal , Estudios de Cohortes , Anonimización de la Información , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pigmentación/genética , Adulto Joven
4.
Elife ; 4: e06974, 2015 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-26175406

RESUMEN

The eukaryotic phylum Apicomplexa encompasses thousands of obligate intracellular parasites of humans and animals with immense socio-economic and health impacts. We sequenced nuclear genomes of Chromera velia and Vitrella brassicaformis, free-living non-parasitic photosynthetic algae closely related to apicomplexans. Proteins from key metabolic pathways and from the endomembrane trafficking systems associated with a free-living lifestyle have been progressively and non-randomly lost during adaptation to parasitism. The free-living ancestor contained a broad repertoire of genes many of which were repurposed for parasitic processes, such as extracellular proteins, components of a motility apparatus, and DNA- and RNA-binding protein families. Based on transcriptome analyses across 36 environmental conditions, Chromera orthologs of apicomplexan invasion-related motility genes were co-regulated with genes encoding the flagellar apparatus, supporting the functional contribution of flagella to the evolution of invasion machinery. This study provides insights into how obligate parasites with diverse life strategies arose from a once free-living phototrophic marine alga.


Asunto(s)
Alveolados/genética , ADN de Algas/química , ADN de Algas/genética , Evolución Molecular , Análisis de Secuencia de ADN , Perfilación de la Expresión Génica , Datos de Secuencia Molecular
5.
Proteomics ; 15(15): 2618-28, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25867681

RESUMEN

Proteomics data can supplement genome annotation efforts, for example being used to confirm gene models or correct gene annotation errors. Here, we present a large-scale proteogenomics study of two important apicomplexan pathogens: Toxoplasma gondii and Neospora caninum. We queried proteomics data against a panel of official and alternate gene models generated directly from RNASeq data, using several newly generated and some previously published MS datasets for this meta-analysis. We identified a total of 201 996 and 39 953 peptide-spectrum matches for T. gondii and N. caninum, respectively, at a 1% peptide FDR threshold. This equated to the identification of 30 494 distinct peptide sequences and 2921 proteins (matches to official gene models) for T. gondii, and 8911 peptides/1273 proteins for N. caninum following stringent protein-level thresholding. We have also identified 289 and 140 loci for T. gondii and N. caninum, respectively, which mapped to RNA-Seq-derived gene models used in our analysis and apparently absent from the official annotation (release 10 from EuPathDB) of these species. We present several examples in our study where the RNA-Seq evidence can help in correction of the current gene model and can help in discovery of potential new genes. The findings of this study have been integrated into the EuPathDB. The data have been deposited to the ProteomeXchange with identifiers PXD000297and PXD000298.


Asunto(s)
Genómica/métodos , Neospora/genética , Neospora/metabolismo , Proteómica/métodos , Toxoplasma/genética , Toxoplasma/metabolismo , Secuencia de Aminoácidos , Apicomplexa/genética , Apicomplexa/metabolismo , Bases de Datos Genéticas , Genes Protozoarios/genética , Anotación de Secuencia Molecular/métodos , Datos de Secuencia Molecular , Péptidos/genética , Péptidos/metabolismo , Proteoma/genética , Proteoma/metabolismo , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Análisis de Secuencia de ARN/métodos , Homología de Secuencia de Aminoácido , Espectrometría de Masas en Tándem/métodos
6.
Bioinformatics ; 28(12): 1571-8, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22513996

RESUMEN

MOTIVATION: Gene-model curation creates consensus gene models by combining multiple sources of protein-coding evidence that may be incomplete or inconsistent. To date, manual curation still produces the highest quality models. However, manual curation is too slow and costly to be completed even for the most important organisms. In recent years, machine-learned ensemble gene predictors have become a viable alternative to manual curation. Current approaches make use of signal and genomic region consistency among sources and some voting scheme to resolve conflicts in the evidence. As a further step in that direction, we have developed eCRAIG (ensemble CRAIG), an automated curation tool that combines multiple sources of evidence using global discriminative training. This allows efficient integration of different types of genomic evidence with complex statistical dependencies to maximize directly annotation accuracy. Our method goes beyond previous work in integrating novel non-linear annotation agreement features, as well as combinations of intrinsic features of the target sequence and extrinsic annotation features. RESULTS: We achieved significant improvements over the best ensemble predictors available for Homo sapiens, Caenorhabditis elegans and Arabidopsis thaliana. In particular, eCRAIG achieved a relative mean improvement of 5.1% over Jigsaw, the best published ensemble predictor in all our experiments. AVAILABILITY: The source code and datasets are both available at http://www.seas.upenn.edu/abernal/ecraig.tgz.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Modelos Genéticos , Algoritmos , Animales , Arabidopsis/genética , Caenorhabditis elegans/genética , Genómica , Humanos
7.
PLoS Comput Biol ; 3(3): e54, 2007 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-17367206

RESUMEN

Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.


Asunto(s)
Inteligencia Artificial , Sistemas de Lectura Abierta/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Análisis Discriminante , Exones , Sensibilidad y Especificidad
8.
Genome Res ; 12(10): 1556-63, 2002 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-12368248

RESUMEN

Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.


Asunto(s)
Genoma Bacteriano , Genómica/métodos , Proteobacteria/genética , Análisis de Secuencia de ADN/métodos , Sitios de Ligazón Microbiológica/genética , Bacteriófagos/genética , Composición de Base/genética , Medios de Cultivo/química , Medios de Cultivo/metabolismo , Reparación del ADN/genética , Replicación del ADN/genética , ADN Bacteriano/genética , Genes Bacterianos/genética , Genes Bacterianos/fisiología , Datos de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Sistemas de Lectura Abierta/fisiología , Plásmidos/genética , Biosíntesis de Proteínas/genética , Proteobacteria/crecimiento & desarrollo , Proteobacteria/patogenicidad , Proteobacteria/fisiología , Recombinación Genética/genética , Especificidad de la Especie
9.
Proc Natl Acad Sci U S A ; 99(19): 12403-8, 2002 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-12205291

RESUMEN

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.


Asunto(s)
Gammaproteobacteria/genética , Gammaproteobacteria/patogenicidad , Genoma Bacteriano , Enfermedades de las Plantas/microbiología , Proteínas Bacterianas/genética , Metabolismo de los Hidratos de Carbono , Citrus/microbiología , Conjugación Genética , Evolución Molecular , Gammaproteobacteria/metabolismo , Datos de Secuencia Molecular , Familia de Multigenes , Nerium/microbiología , Sistemas de Lectura Abierta , Prunus/microbiología , Especificidad de la Especie , Virulencia/genética
10.
J Bacteriol ; 184(16): 4555-72, 2002 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-12142426

RESUMEN

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.


Asunto(s)
Coenzima A/biosíntesis , Escherichia coli/metabolismo , Flavina-Adenina Dinucleótido/biosíntesis , NADP/biosíntesis , Antibacterianos , Huella de ADN , Elementos Transponibles de ADN , Diseño de Fármacos , Farmacorresistencia Bacteriana , Escherichia coli/efectos de los fármacos , Escherichia coli/genética , Mononucleótido de Flavina/biosíntesis , Genoma Bacteriano , Mutagénesis Insercional , Nicotinamida-Nucleótido Adenililtransferasa/metabolismo , Fosfotransferasas (Aceptor de Grupo Alcohol)/genética , Especificidad por Sustrato
11.
J Bacteriol ; 184(7): 2005-18, 2002 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-11889109

RESUMEN

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.


Asunto(s)
Fusobacterium nucleatum/genética , Genoma Bacteriano , Biosíntesis de Proteínas , Transcripción Genética , Aminoácidos/metabolismo , Proteínas de la Membrana Bacteriana Externa/metabolismo , Transporte Biológico , División Celular , Coenzimas/metabolismo , Reparación del ADN , Replicación del ADN , Elementos Transponibles de ADN , ADN Bacteriano/análisis , Farmacorresistencia Bacteriana , Fusobacterium nucleatum/metabolismo , Metabolismo de los Lípidos , Lipopolisacáridos/metabolismo , Mutagénesis Insercional , Nucleótidos/metabolismo , Protones , Transducción de Señal/fisiología , Virulencia
12.
Proc Natl Acad Sci U S A ; 99(1): 443-8, 2002 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-11756688

RESUMEN

Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other alpha-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.


Asunto(s)
Brucella melitensis/genética , Genoma Bacteriano , Cromosomas , Ácidos Grasos/metabolismo , Modelos Biológicos , Modelos Genéticos , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Origen de Réplica , Análisis de Secuencia de ADN , Transducción de Señal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...