Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 619(7971): 828-836, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37438524

RESUMEN

Splice-switching antisense oligonucleotides (ASOs) could be used to treat a subset of individuals with genetic diseases1, but the systematic identification of such individuals remains a challenge. Here we performed whole-genome sequencing analyses to characterize genetic variation in 235 individuals (from 209 families) with ataxia-telangiectasia, a severely debilitating and life-threatening recessive genetic disorder2,3, yielding a complete molecular diagnosis in almost all individuals. We developed a predictive taxonomy to assess the amenability of each individual to splice-switching ASO intervention; 9% and 6% of the individuals had variants that were 'probably' or 'possibly' amenable to ASO splice modulation, respectively. Most amenable variants were in deep intronic regions that are inaccessible to exon-targeted sequencing. We developed ASOs that successfully rescued mis-splicing and ATM cellular signalling in patient fibroblasts for two recurrent variants. In a pilot clinical study, one of these ASOs was used to treat a child who had been diagnosed with ataxia-telangiectasia soon after birth, and showed good tolerability without serious adverse events for three years. Our study provides a framework for the prospective identification of individuals with genetic diseases who might benefit from a therapeutic approach involving splice-switching ASOs.


Asunto(s)
Ataxia Telangiectasia , Empalme del ARN , Niño , Humanos , Ataxia Telangiectasia/tratamiento farmacológico , Ataxia Telangiectasia/genética , Oligonucleótidos Antisentido/genética , Oligonucleótidos Antisentido/farmacología , Oligonucleótidos Antisentido/uso terapéutico , Estudios Prospectivos , Empalme del ARN/efectos de los fármacos , Empalme del ARN/genética , Secuenciación Completa del Genoma , Intrones , Exones , Medicina de Precisión , Proyectos Piloto
2.
Nat Methods ; 20(9): 1323-1335, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37550580

RESUMEN

Droplet-based single-cell assays, including single-cell RNA sequencing (scRNA-seq), single-nucleus RNA sequencing (snRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), generate considerable background noise counts, the hallmark of which is nonzero counts in cell-free droplets and off-target gene expression in unexpected cell types. Such systematic background noise can lead to batch effects and spurious differential gene expression results. Here we develop a deep generative model based on the phenomenology of noise generation in droplet-based assays. The proposed model accurately distinguishes cell-containing droplets from cell-free droplets, learns the background noise profile and provides noise-free quantification in an end-to-end fashion. We implement this approach in the scalable and robust open-source software package CellBender. Analysis of simulated data demonstrates that CellBender operates near the theoretically optimal denoising limit. Extensive evaluations using real datasets and experimental benchmarks highlight enhanced concordance between droplet-based single-cell data and established gene expression patterns, while the learned background noise profile provides evidence of degraded or uncaptured cell types.


Asunto(s)
ARN Nuclear Pequeño , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos
3.
Nature ; 581(7809): 444-451, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32461652

RESUMEN

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Asunto(s)
Enfermedad/genética , Variación Genética , Genética Médica/normas , Genética de Población/normas , Genoma Humano/genética , Femenino , Pruebas Genéticas , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Mutación , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Estándares de Referencia , Selección Genética , Secuenciación Completa del Genoma
4.
Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350676

RESUMEN

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.


Asunto(s)
Genoma Humano , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Genómica , Genotipo , Variación Genética
5.
Circulation ; 145(2): 122-133, 2022 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-34743566

RESUMEN

BACKGROUND: Artificial intelligence (AI)-enabled analysis of 12-lead ECGs may facilitate efficient estimation of incident atrial fibrillation (AF) risk. However, it remains unclear whether AI provides meaningful and generalizable improvement in predictive accuracy beyond clinical risk factors for AF. METHODS: We trained a convolutional neural network (ECG-AI) to infer 5-year incident AF risk using 12-lead ECGs in patients receiving longitudinal primary care at Massachusetts General Hospital (MGH). We then fit 3 Cox proportional hazards models, composed of ECG-AI 5-year AF probability, CHARGE-AF clinical risk score (Cohorts for Heart and Aging in Genomic Epidemiology-Atrial Fibrillation), and terms for both ECG-AI and CHARGE-AF (CH-AI), respectively. We assessed model performance by calculating discrimination (area under the receiver operating characteristic curve) and calibration in an internal test set and 2 external test sets (Brigham and Women's Hospital [BWH] and UK Biobank). Models were recalibrated to estimate 2-year AF risk in the UK Biobank given limited available follow-up. We used saliency mapping to identify ECG features most influential on ECG-AI risk predictions and assessed correlation between ECG-AI and CHARGE-AF linear predictors. RESULTS: The training set comprised 45 770 individuals (age 55±17 years, 53% women, 2171 AF events) and the test sets comprised 83 162 individuals (age 59±13 years, 56% women, 2424 AF events). Area under the receiver operating characteristic curve was comparable using CHARGE-AF (MGH, 0.802 [95% CI, 0.767-0.836]; BWH, 0.752 [95% CI, 0.741-0.763]; UK Biobank, 0.732 [95% CI, 0.704-0.759]) and ECG-AI (MGH, 0.823 [95% CI, 0.790-0.856]; BWH, 0.747 [95% CI, 0.736-0.759]; UK Biobank, 0.705 [95% CI, 0.673-0.737]). Area under the receiver operating characteristic curve was highest using CH-AI (MGH, 0.838 [95% CI, 0.807 to 0.869]; BWH, 0.777 [95% CI, 0.766 to 0.788]; UK Biobank, 0.746 [95% CI, 0.716 to 0.776]). Calibration error was low using ECG-AI (MGH, 0.0212; BWH, 0.0129; UK Biobank, 0.0035) and CH-AI (MGH, 0.012; BWH, 0.0108; UK Biobank, 0.0001). In saliency analyses, the ECG P-wave had the greatest influence on AI model predictions. ECG-AI and CHARGE-AF linear predictors were correlated (Pearson r: MGH, 0.61; BWH, 0.66; UK Biobank, 0.41). CONCLUSIONS: AI-based analysis of 12-lead ECGs has similar predictive usefulness to a clinical risk factor model for incident AF and the approaches are complementary. ECG-AI may enable efficient quantification of future AF risk.


Asunto(s)
Fibrilación Atrial/diagnóstico , Aprendizaje Profundo/normas , Electrocardiografía/métodos , Fibrilación Atrial/patología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Factores de Riesgo
6.
Bioinformatics ; 38(11): 3116-3117, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35441669

RESUMEN

SUMMARY: We developed the variant-Set Test for Association using Annotation infoRmation (STAAR) workflow description language (WDL) workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the WDL allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power. This tool was specifically developed and optimized to be implemented on cloud-based platforms such as BioData Catalyst Powered by Terra. It provides easy-to-use functionality for rare variant analysis that can be incorporated into an exhaustive whole genome sequencing analysis pipeline. AVAILABILITY AND IMPLEMENTATION: The workflow is freely available from https://dockstore.org/workflows/github.com/sheilagaynor/STAAR_workflow. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nube Computacional , Programas Informáticos , Flujo de Trabajo , Genoma , Estudio de Asociación del Genoma Completo
7.
Cell ; 133(7): 1266-76, 2008 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-18585359

RESUMEN

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.


Asunto(s)
ADN/química , Proteínas de Homeodominio/química , Animales , Secuencia de Bases , Biología Computacional , Secuencia Conservada , ADN/metabolismo , Evolución Molecular , Proteínas de Homeodominio/metabolismo , Ratones , Modelos Moleculares , Unión Proteica , Factores de Transcripción/química , Factores de Transcripción/metabolismo
8.
N Engl J Med ; 381(7): 668-676, 2019 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-31412182

RESUMEN

Knowledge gained from observational cohort studies has dramatically advanced the prevention and treatment of diseases. Many of these cohorts, however, are small, lack diversity, or do not provide comprehensive phenotype data. The All of Us Research Program plans to enroll a diverse group of at least 1 million persons in the United States in order to accelerate biomedical research and improve health. The program aims to make the research results accessible to participants, and it is developing new approaches to generate, access, and make data broadly available to approved researchers. All of Us opened for enrollment in May 2018 and currently enrolls participants 18 years of age or older from a network of more than 340 recruitment sites. Elements of the program protocol include health questionnaires, electronic health records (EHRs), physical measurements, the use of digital health technology, and the collection and analysis of biospecimens. As of July 2019, more than 175,000 participants had contributed biospecimens. More than 80% of these participants are from groups that have been historically underrepresented in biomedical research. EHR data on more than 112,000 participants from 34 sites have been collected. The All of Us data repository should permit researchers to take into account individual differences in lifestyle, socioeconomic factors, environment, and biologic characteristics in order to advance precision diagnosis, prevention, and treatment.


Asunto(s)
Bancos de Muestras Biológicas , Investigación Biomédica , Estudios de Cohortes , Conjuntos de Datos como Asunto , Registros Electrónicos de Salud , Encuestas Epidemiológicas , Humanos , Estudios Observacionales como Asunto , Medicina de Precisión , Proyectos de Investigación , Estados Unidos
9.
Circ Res ; 127(1): 155-169, 2020 06 19.
Artículo en Inglés | MEDLINE | ID: mdl-32833571

RESUMEN

Machine learning applications in cardiology have rapidly evolved in the past decade. With the availability of machine learning tools coupled with vast data sources, the management of atrial fibrillation (AF), a common chronic disease with significant associated morbidity and socioeconomic impact, is undergoing a knowledge and practice transformation in the increasingly complex healthcare environment. Among other advances, deep-learning machine learning methods, including convolutional neural networks, have enabled the development of AF screening pathways using the ubiquitous 12-lead ECG to detect asymptomatic paroxysmal AF in at-risk populations (such as those with cryptogenic stroke), the refinement of AF and stroke prediction schemes through comprehensive digital phenotyping using structured and unstructured data abstraction from the electronic health record or wearable monitoring technologies, and the optimization of treatment strategies, ranging from stroke prophylaxis to monitoring of antiarrhythmic drug (AAD) therapy. Although the clinical and population-wide impact of these tools continues to be elucidated, such transformative progress does not come without challenges, such as the concerns about adopting black box technologies, assessing input data quality for training such models, and the risk of perpetuating rather than alleviating health disparities. This review critically appraises the advances of machine learning related to the care of AF thus far, their potential future directions, and its potential limitations and challenges.


Asunto(s)
Fibrilación Atrial/diagnóstico , Electrocardiografía/métodos , Aprendizaje Automático , Humanos
11.
Am J Hum Genet ; 100(5): 695-705, 2017 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-28475856

RESUMEN

Provision of a molecularly confirmed diagnosis in a timely manner for children and adults with rare genetic diseases shortens their "diagnostic odyssey," improves disease management, and fosters genetic counseling with respect to recurrence risks while assuring reproductive choices. In a general clinical genetics setting, the current diagnostic rate is approximately 50%, but for those who do not receive a molecular diagnosis after the initial genetics evaluation, that rate is much lower. Diagnostic success for these more challenging affected individuals depends to a large extent on progress in the discovery of genes associated with, and mechanisms underlying, rare diseases. Thus, continued research is required for moving toward a more complete catalog of disease-related genes and variants. The International Rare Diseases Research Consortium (IRDiRC) was established in 2011 to bring together researchers and organizations invested in rare disease research to develop a means of achieving molecular diagnosis for all rare diseases. Here, we review the current and future bottlenecks to gene discovery and suggest strategies for enabling progress in this regard. Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling precision medicine for this patient population.


Asunto(s)
Cooperación Internacional , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Bases de Datos Factuales , Exoma , Genoma Humano , Humanos
12.
Genet Med ; 21(5): 1173-1180, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30270359

RESUMEN

PURPOSE: Large-scale, population-based biobanks integrating health records and genomic profiles may provide a platform to identify individuals with disease-predisposing genetic variants. Here, we recall probands carrying familial hypercholesterolemia (FH)-associated variants, perform cascade screening of family members, and describe health outcomes affected by such a strategy. METHODS: The Estonian Biobank of Estonian Genome Center, University of Tartu, comprises 52,274 individuals. Among 4776 participants with exome or genome sequences, we identified 27 individuals who carried FH-associated variants in the LDLR, APOB, or PCSK9 genes. Cascade screening of 64 family members identified an additional 20 carriers of FH-associated variants. RESULTS: Via genetic counseling and clinical management of carriers, we were able to reclassify 51% of the study participants from having previously established nonspecific hypercholesterolemia to having FH and identify 32% who were completely unaware of harboring a high-risk disease-associated genetic variant. Imaging-based risk stratification targeted 86% of the variant carriers for statin treatment recommendations. CONCLUSION: Genotype-guided recall of probands and subsequent cascade screening for familial hypercholesterolemia is feasible within a population-based biobank and may facilitate more appropriate clinical management.


Asunto(s)
Hiperlipoproteinemia Tipo II/diagnóstico , Hiperlipoproteinemia Tipo II/epidemiología , Hiperlipoproteinemia Tipo II/genética , Tamizaje Masivo/métodos , Apolipoproteína B-100/genética , Bancos de Muestras Biológicas , Estonia/epidemiología , Femenino , Genotipo , Humanos , Masculino , Mutación , Proproteína Convertasa 9/genética , Receptores de LDL/genética , Análisis de Secuencia de ADN
13.
PLoS Genet ; 12(1): e1005772, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26796797

RESUMEN

A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma , Biblioteca Genómica , Investigación sobre Servicios de Salud
14.
Hum Mutat ; 39(12): 1827-1834, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30240502

RESUMEN

Rare disease investigators constantly face challenges in identifying additional cases to build evidence for gene-disease causality. The Matchmaker Exchange (MME) addresses this limitation by providing a mechanism for matching patients across genomic centers via a federated network. The MME has revolutionized searching for additional cases by making it possible to query across institutional boundaries, so that what was once a laborious and manual process of contacting researchers is now automated and computable. However, while the MME network is beginning to scale, the growth of additional nodes is limited by the lack of easy-to-use solutions that can be implemented by any rare disease database owner, even one without significant software engineering resources. Here, we describe matchbox, which is an open-source, platform-independent, portable bridge between any given rare disease genomic center and the MME network, which has already led to novel gene discoveries. We also describe how matchbox greatly reduces the barrier to participation by overcoming challenges for new databases to join the MME.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Selección de Paciente , Enfermedades Raras/genética , Acceso a la Información , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Difusión de la Información/métodos , Fenotipo , Programas Informáticos , Navegador Web
15.
Hum Mutat ; 38(10): 1281-1285, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28699299

RESUMEN

The Matchmaker Exchange (MME) connects rare disease clinicians and researchers to facilitate the sharing of data from undiagnosed patients for the purpose of novel gene discovery. Such sharing raises the odds that two or more similar patients with candidate genes in common may be found, thereby allowing their condition to be more readily studied and understood. Consent considerations for data sharing in MME included both the ethical and legal differences between clinical and research settings and the level of privacy risk involved in sharing varying amounts of rare disease patient data to enable patient matches. In this commentary, we discuss these consent considerations and the resulting MME Consent Policy as they may be relevant to other international data sharing initiatives.


Asunto(s)
Estudios de Asociación Genética , Enfermedades Genéticas Congénitas , Difusión de la Información , Enfermedades Raras/genética , Bases de Datos Genéticas , Genómica , Humanos , Selección de Paciente , Médicos , Investigadores , Investigación Biomédica Traslacional
16.
Proc Natl Acad Sci U S A ; 110(12): 4667-72, 2013 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-23487782

RESUMEN

Mechanotransduction, the pathway by which mechanical forces are translated to biological signals, plays important but poorly characterized roles in physiology. PIEZOs are recently identified, widely expressed, mechanically activated ion channels that are hypothesized to play a role in mechanotransduction in mammals. Here, we describe two distinct PIEZO2 mutations in patients with a subtype of Distal Arthrogryposis Type 5 characterized by generalized autosomal dominant contractures with limited eye movements, restrictive lung disease, and variable absence of cruciate knee ligaments. Electrophysiological studies reveal that the two PIEZO2 mutations affect biophysical properties related to channel inactivation: both E2727del and I802F mutations cause the PIEZO2-dependent, mechanically activated currents to recover faster from inactivation, while E2727del also causes a slowing of inactivation. Both types of changes in kinetics result in increased channel activity in response to a given mechanical stimulus, suggesting that Distal Arthrogryposis Type 5 can be caused by gain-of-function mutations in PIEZO2. We further show that overexpression of mutated PIEZO2 cDNAs does not cause constitutive activity or toxicity to cells, indicating that the observed phenotype is likely due to a mechanotransduction defect. Our studies identify a type of channelopathy and link the dysfunction of mechanically activated ion channels to developmental malformations and joint contractures.


Asunto(s)
Artrogriposis , Enfermedades Genéticas Congénitas , Canales Iónicos/genética , Canales Iónicos/metabolismo , Mecanotransducción Celular/genética , Mutación , Adulto , Artrogriposis/genética , Artrogriposis/metabolismo , Artrogriposis/patología , Artrogriposis/fisiopatología , Línea Celular , Femenino , Enfermedades Genéticas Congénitas/genética , Enfermedades Genéticas Congénitas/metabolismo , Enfermedades Genéticas Congénitas/patología , Enfermedades Genéticas Congénitas/fisiopatología , Humanos , Lactante , Recién Nacido , Masculino
17.
Hum Mutat ; 36(10): 915-21, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26295439

RESUMEN

There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Difusión de la Información/métodos , Enfermedades Raras/genética , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Estudios de Asociación Genética , Humanos , Programas Informáticos
18.
Pac Symp Biocomput ; 29: 261-275, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160285

RESUMEN

The drug development pipeline for a new compound can last 10-20 years and cost over $10 billion. Drug repurposing offers a more time- and cost-effective alternative. Computational approaches based on network graph representations, comprising a mixture of disease nodes and their interactions, have recently yielded new drug repurposing hypotheses, including suitable candidates for COVID-19. However, these interactomes remain aggregate by design and often lack disease specificity. This dilution of information may affect the relevance of drug node embeddings to a particular disease, the resulting drug-disease and drug-drug similarity scores, and therefore our ability to identify new targets or drug synergies. To address this problem, we propose constructing and learning disease-specific hypergraphs in which hyperedges encode biological pathways of various lengths. We use a modified node2vec algorithm to generate pathway embeddings. We evaluate our hypergraph's ability to find repurposing targets for an incurable but prevalent disease, Alzheimer's disease (AD), and compare our ranked-ordered recommendations to those derived from a state-of-the-art knowledge graph, the multiscale interactome. Using our method, we successfully identified 7 promising repurposing candidates for AD that were ranked as unlikely repurposing targets by the multiscale interactome but for which the existing literature provides supporting evidence. Additionally, our drug repositioning suggestions are accompanied by explanations, eliciting plausible biological pathways. In the future, we plan on scaling our proposed method to 800+ diseases, combining single-disease hypergraphs into multi-disease hypergraphs to account for subpopulations with risk factors or encode a given patient's comorbidities to formulate personalized repurposing recommendations.Supplementary materials and code: https://github.com/ayujain04/psb_supplement.


Asunto(s)
Biología Computacional , Reposicionamiento de Medicamentos , Humanos , Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , Algoritmos
19.
JAMA Cardiol ; 9(2): 174-181, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-37950744

RESUMEN

Importance: The gold standard for outcome adjudication in clinical trials is medical record review by a physician clinical events committee (CEC), which requires substantial time and expertise. Automated adjudication of medical records by natural language processing (NLP) may offer a more resource-efficient alternative but this approach has not been validated in a multicenter setting. Objective: To externally validate the Community Care Cohort Project (C3PO) NLP model for heart failure (HF) hospitalization adjudication, which was previously developed and tested within one health care system, compared to gold-standard CEC adjudication in a multicenter clinical trial. Design, Setting, and Participants: This was a retrospective analysis of the Influenza Vaccine to Effectively Stop Cardio Thoracic Events and Decompensated Heart Failure (INVESTED) trial, which compared 2 influenza vaccines in 5260 participants with cardiovascular disease at 157 sites in the US and Canada between September 2016 and January 2019. Analysis was performed from November 2022 to October 2023. Exposures: Individual sites submitted medical records for each hospitalization. The central INVESTED CEC and the C3PO NLP model independently adjudicated whether the cause of hospitalization was HF using the prepared hospitalization dossier. The C3PO NLP model was fine-tuned (C3PO + INVESTED) and a de novo NLP model was trained using half the INVESTED hospitalizations. Main Outcomes and Measures: Concordance between the C3PO NLP model HF adjudication and the gold-standard INVESTED CEC adjudication was measured by raw agreement, κ, sensitivity, and specificity. The fine-tuned and de novo INVESTED NLP models were evaluated in an internal validation cohort not used for training. Results: Among 4060 hospitalizations in 1973 patients (mean [SD] age, 66.4 [13.2] years; 514 [27.4%] female and 1432 [72.6%] male]), 1074 hospitalizations (26%) were adjudicated as HF by the CEC. There was good agreement between the C3PO NLP and CEC HF adjudications (raw agreement, 87% [95% CI, 86-88]; κ, 0.69 [95% CI, 0.66-0.72]). C3PO NLP model sensitivity was 94% (95% CI, 92-95) and specificity was 84% (95% CI, 83-85). The fine-tuned C3PO and de novo NLP models demonstrated agreement of 93% (95% CI, 92-94) and κ of 0.82 (95% CI, 0.77-0.86) and 0.83 (95% CI, 0.79-0.87), respectively, vs the CEC. CEC reviewer interrater reproducibility was 94% (95% CI, 93-95; κ, 0.85 [95% CI, 0.80-0.89]). Conclusions and Relevance: The C3PO NLP model developed within 1 health care system identified HF events with good agreement relative to the gold-standard CEC in an external multicenter clinical trial. Fine-tuning the model improved agreement and approximated human reproducibility. Further study is needed to determine whether NLP will improve the efficiency of future multicenter clinical trials by identifying clinical events at scale.

20.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37291427

RESUMEN

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Isoformas de ARN , ADN Complementario/genética , Isoformas de ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , ARN/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA