Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.451
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Immunity ; 56(11): 2650-2663.e6, 2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-37816353

RESUMEN

The accurate selection of neoantigens that bind to class I human leukocyte antigen (HLA) and are recognized by autologous T cells is a crucial step in many cancer immunotherapy pipelines. We reprocessed whole-exome sequencing and RNA sequencing (RNA-seq) data from 120 cancer patients from two external large-scale neoantigen immunogenicity screening assays combined with an in-house dataset of 11 patients and identified 46,017 somatic single-nucleotide variant mutations and 1,781,445 neo-peptides, of which 212 mutations and 178 neo-peptides were immunogenic. Beyond features commonly used for neoantigen prioritization, factors such as the location of neo-peptides within protein HLA presentation hotspots, binding promiscuity, and the role of the mutated gene in oncogenicity were predictive for immunogenicity. The classifiers accurately predicted neoantigen immunogenicity across datasets and improved their ranking by up to 30%. Besides insights into machine learning methods for neoantigen ranking, we have provided homogenized datasets valuable for developing and benchmarking companion algorithms for neoantigen-based immunotherapies.


Asunto(s)
Antígenos de Neoplasias , Neoplasias , Humanos , Antígenos de Neoplasias/genética , Neoplasias/genética , Neoplasias/terapia , Antígenos de Histocompatibilidad Clase I , Aprendizaje Automático , Péptidos , Inmunoterapia/métodos
2.
Cell ; 167(5): 1369-1384.e19, 2016 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-27863249

RESUMEN

Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.


Asunto(s)
Células Sanguíneas/citología , Enfermedad/genética , Regiones Promotoras Genéticas , Linaje de la Célula , Separación Celular , Cromatina , Elementos de Facilitación Genéticos , Epigenómica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Hematopoyesis , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
3.
Trends Genet ; 40(8): 642-667, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38734482

RESUMEN

Genome-wide association studies (GWASs) have identified numerous genetic loci associated with human traits and diseases. However, pinpointing the causal genes remains a challenge, which impedes the translation of GWAS findings into biological insights and medical applications. In this review, we provide an in-depth overview of the methods and technologies used for prioritizing genes from GWAS loci, including gene-based association tests, integrative analysis of GWAS and molecular quantitative trait loci (xQTL) data, linking GWAS variants to target genes through enhancer-gene connection maps, and network-based prioritization. We also outline strategies for generating context-dependent xQTL data and their applications in gene prioritization. We further highlight the potential of gene prioritization in drug repurposing. Lastly, we discuss future challenges and opportunities in this field.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Sitios de Carácter Cuantitativo/genética , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple/genética , Redes Reguladoras de Genes/genética
4.
Am J Hum Genet ; 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39255797

RESUMEN

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.

5.
Am J Hum Genet ; 111(6): 1035-1046, 2024 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-38754426

RESUMEN

Obesity is a major risk factor for a myriad of diseases, affecting >600 million people worldwide. Genome-wide association studies (GWASs) have identified hundreds of genetic variants that influence body mass index (BMI), a commonly used metric to assess obesity risk. Most variants are non-coding and likely act through regulating genes nearby. Here, we apply multiple computational methods to prioritize the likely causal gene(s) within each of the 536 previously reported GWAS-identified BMI-associated loci. We performed summary-data-based Mendelian randomization (SMR), FINEMAP, DEPICT, MAGMA, transcriptome-wide association studies (TWASs), mutation significance cutoff (MSC), polygenic priority score (PoPS), and the nearest gene strategy. Results of each method were weighted based on their success in identifying genes known to be implicated in obesity, ranking all prioritized genes according to a confidence score (minimum: 0; max: 28). We identified 292 high-scoring genes (≥11) in 264 loci, including genes known to play a role in body weight regulation (e.g., DGKI, ANKRD26, MC4R, LEPR, BDNF, GIPR, AKT3, KAT8, MTOR) and genes related to comorbidities (e.g., FGFR1, ISL1, TFAP2B, PARK2, TCF7L2, GSK3B). For most of the high-scoring genes, however, we found limited or no evidence for a role in obesity, including the top-scoring gene BPTF. Many of the top-scoring genes seem to act through a neuronal regulation of body weight, whereas others affect peripheral pathways, including circadian rhythm, insulin secretion, and glucose and carbohydrate homeostasis. The characterization of these likely causal genes can increase our understanding of the underlying biology and offer avenues to develop therapeutics for weight loss.


Asunto(s)
Índice de Masa Corporal , Estudio de Asociación del Genoma Completo , Obesidad , Humanos , Obesidad/genética , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple , Herencia Multifactorial/genética , Sitios Genéticos , Análisis de la Aleatorización Mendeliana
6.
Proc Natl Acad Sci U S A ; 121(34): e2402970121, 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39133856

RESUMEN

Ecosystem restoration is inherently a complex activity with inevitable tradeoffs in environmental and societal outcomes. These tradeoffs can potentially be large when policies and practices are focused on single outcomes versus joint achievement of multiple outcomes. Few studies have assessed the tradeoffs in Nature's Contributions to People (NCP) and the distributional equity of NCP from forest restoration strategies. Here, we optimized a defined forest restoration area across India with systematic conservation planning to assess the tradeoffs between three NCP: i) climate change mitigation NCP, ii) biodiversity value NCP (habitat created for forest-dependent mammals), and iii) societal NCP (human direct use of restored forests for livelihoods, housing construction material, and energy). We show that restoration plans aimed at a single-NCP tend not to deliver other NCP outcomes efficiently. In contrast, integrated spatial forest restoration plans aimed at achievement of multiple outcomes deliver on average 83.3% (43.2 to 100%) of climate change mitigation NCP, 89.9% (63.8 to 100%) of biodiversity value NCP, and 93.9% (64.5 to 100%) of societal NCP delivered by single-objective plans. Integrated plans deliver NCP more evenly across the restoration area when compared to other plans that identify certain regions such as the Western Ghats and north-eastern India. Last, 38 to 41% of the people impacted by integrated spatial plans belong to socioeconomically disadvantaged groups, greater than their overall representation in India's population. Moving ahead, effective policy design and evaluation integrating ecosystem protection and restoration strategies can benefit from the blueprint we provide in this study for India.


Asunto(s)
Biodiversidad , Cambio Climático , Conservación de los Recursos Naturales , Bosques , Conservación de los Recursos Naturales/métodos , Humanos , India , Ecosistema , Restauración y Remediación Ambiental/métodos
7.
Proc Natl Acad Sci U S A ; 121(17): e2307214121, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38621123

RESUMEN

Environmental DNA (eDNA) metabarcoding has the potential to revolutionize conservation planning by providing spatially and taxonomically comprehensive data on biodiversity and ecosystem conditions, but its utility to inform the design of protected areas remains untested. Here, we quantify whether and how identifying conservation priority areas within coral reef ecosystems differs when biodiversity information is collected via eDNA analyses or traditional visual census records. We focus on 147 coral reefs in Indonesia's hyper-diverse Wallacea region and show large discrepancies in the allocation and spatial design of conservation priority areas when coral reef species were surveyed with underwater visual techniques (fishes, corals, and algae) or eDNA metabarcoding (eukaryotes and metazoans). Specifically, incidental protection occurred for 55% of eDNA species when targets were set for species detected by visual surveys and 71% vice versa. This finding is supported by generally low overlap in detection between visual census and eDNA methods at species level, with more overlap at higher taxonomic ranks. Incomplete taxonomic reference databases for the highly diverse Wallacea reefs, and the complementary detection of species by the two methods, underscore the current need to combine different biodiversity data sources to maximize species representation in conservation planning.


Asunto(s)
Antozoos , ADN Ambiental , Animales , Arrecifes de Coral , Ecosistema , ADN Ambiental/genética , Biodiversidad , Antozoos/genética , Peces , Código de Barras del ADN Taxonómico
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279653

RESUMEN

Cluster analysis is one of the most widely used exploratory methods for visualization and grouping of gene expression patterns across multiple samples or treatment groups. Although several existing online tools can annotate clusters with functional terms, there is no all-in-one webserver to effectively prioritize genes/clusters using gene essentiality as well as congruency of mRNA-protein expression. Hence, we developed CAP-RNAseq that makes possible (1) upload and clustering of bulk RNA-seq data followed by identification, annotation and network visualization of all or selected clusters; and (2) prioritization using DepMap gene essentiality and/or dependency scores as well as the degree of correlation between mRNA and protein levels of genes within an expression cluster. In addition, CAP-RNAseq has an integrated primer design tool for the prioritized genes. Herein, we showed using comparisons with the existing tools and multiple case studies that CAP-RNAseq can uniquely aid in the discovery of co-expression clusters enriched with essential genes and prioritization of novel biomarker genes that exhibit high correlations between their mRNA and protein expression levels. CAP-RNAseq is applicable to RNA-seq data from different contexts including cancer and available at http://konulabapps.bilkent.edu.tr:3838/CAPRNAseq/ and the docker image is downloadable from https://hub.docker.com/r/konulab/caprnaseq.


Asunto(s)
Proteómica , Análisis de Secuencia de ARN/métodos , RNA-Seq , ARN Mensajero/genética
9.
Genet Epidemiol ; 2024 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-39318036

RESUMEN

The introduction of Next-Generation Sequencing technologies in the clinics has improved rare disease diagnosis. Nonetheless, for very heterogeneous or very rare diseases, more than half of cases still lack molecular diagnosis. Novel strategies are needed to prioritize variants within a single individual. The Population Sampling Probability (PSAP) method was developed to meet this aim but only for coding variants in exome data. Here, we propose an extension of the PSAP method to the non-coding genome called PSAP-genomic-regions. In this extension, instead of considering genes as testing units (PSAP-genes strategy), we use genomic regions defined over the whole genome that pinpoint potential functional constraints. We conceived an evaluation protocol for our method using artificially generated disease exomes and genomes, by inserting coding and non-coding pathogenic ClinVar variants in large data sets of exomes and genomes from the general population. PSAP-genomic-regions significantly improves the ranking of these variants compared to using a pathogenicity score alone. Using PSAP-genomic-regions, more than 50% of non-coding ClinVar variants were among the top 10 variants of the genome. On real sequencing data from six patients with Cerebral Small Vessel Disease and nine patients with male infertility, all causal variants were ranked in the top 100 variants with PSAP-genomic-regions. By revisiting the testing units used in the PSAP method to include non-coding variants, we have developed PSAP-genomic-regions, an efficient whole-genome prioritization tool which offers promising results for the diagnosis of unresolved rare diseases.

10.
Genet Epidemiol ; 48(7): 324-343, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38940260

RESUMEN

Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants (RVs). Since different families can harbor different causal variants and each family harbors many RVs, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, for example, pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize RVs shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known probability of carrying a causal variant. In contrast, local approaches condition on a variant being observed in specific families to eliminate the carrier probability. Our simulation results indicate that global approaches are robust to misspecification of the carrier probability and prioritize more effectively than local approaches even when the carrier probability is misspecified.


Asunto(s)
Variación Genética , Humanos , Modelos Genéticos , Predisposición Genética a la Enfermedad , Simulación por Computador , Linaje , Familia , Exoma/genética , Modelos Estadísticos , Desequilibrio de Ligamiento , Análisis de Secuencia de ADN/métodos
11.
Trends Genet ; 38(12): 1271-1283, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35934592

RESUMEN

A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices.


Asunto(s)
Exoma , Enfermedades Raras , Humanos , Fenotipo , Secuenciación del Exoma , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética
12.
Am J Hum Genet ; 109(8): 1366-1387, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35931049

RESUMEN

A major challenge of genome-wide association studies (GWASs) is to translate phenotypic associations into biological insights. Here, we integrate a large GWAS on blood lipids involving 1.6 million individuals from five ancestries with a wide array of functional genomic datasets to discover regulatory mechanisms underlying lipid associations. We first prioritize lipid-associated genes with expression quantitative trait locus (eQTL) colocalizations and then add chromatin interaction data to narrow the search for functional genes. Polygenic enrichment analysis across 697 annotations from a host of tissues and cell types confirms the central role of the liver in lipid levels and highlights the selective enrichment of adipose-specific chromatin marks in high-density lipoprotein cholesterol and triglycerides. Overlapping transcription factor (TF) binding sites with lipid-associated loci identifies TFs relevant in lipid biology. In addition, we present an integrative framework to prioritize causal variants at GWAS loci, producing a comprehensive list of candidate causal genes and variants with multiple layers of functional evidence. We highlight two of the prioritized genes, CREBRF and RRBP1, which show convergent evidence across functional datasets supporting their roles in lipid biology.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Cromatina/genética , Genómica , Humanos , Lípidos/genética , Polimorfismo de Nucleótido Simple/genética
13.
Am J Hum Genet ; 109(2): 270-281, 2022 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-35063063

RESUMEN

In recent years, exome sequencing (ES) has shown great utility in the diagnoses of Mendelian disorders. However, after rigorous filtering, a typical ES analysis still involves the interpretation of hundreds of variants, which greatly hinders the rapid identification of causative genes. Since the interpretations of ES data require comprehensive clinical analyses, taking clinical expertise into consideration can speed the molecular diagnoses of Mendelian disorders. To leverage clinical expertise to prioritize candidate genes, we developed PhenoApt, a phenotype-driven gene prioritization tool that allows users to assign a customized weight to each phenotype, via a machine-learning algorithm. Using the ability to rank causative genes in top-10 lists as an evaluation metric, baseline analysis demonstrated that PhenoApt outperformed previous phenotype-driven gene prioritization tools by a relative increase of 22.7%-140.0% in three independent, real-world, multi-center cohorts (cohort 1, n = 185; cohort 2, n = 784; and cohort 3, n = 208). Additional trials showed that, by adding weights to clinical indications, which should be explained by the causative gene, PhenoApt performance was improved by a relative increase of 37.3% in cohort 2 (n = 471) and 21.4% in cohort 3 (n = 208). Moreover, PhenoApt could assign an intrinsic weight to each phenotype based on the likelihood of its being a Mendelian trait using term frequency-inverse document frequency techniques. When clinical indications were assigned with intrinsic weights, PhenoApt performance was improved by a relative increase of 23.7% in cohort 2 and 15.5% in cohort 3. For the integration of PhenoApt into clinical practice, we developed a user-friendly website and a command-line tool.


Asunto(s)
Enfermedades Genéticas Congénitas/genética , Pérdida Auditiva Sensorineural/genética , Discapacidad Intelectual/genética , Aprendizaje Automático , Microcefalia/genética , Nistagmo Congénito/genética , Escoliosis/genética , Estudios de Cohortes , Biología Computacional , Bases de Datos Genéticas , Exoma , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/patología , Pruebas Genéticas , Genotipo , Pérdida Auditiva Sensorineural/diagnóstico , Pérdida Auditiva Sensorineural/patología , Humanos , Discapacidad Intelectual/diagnóstico , Discapacidad Intelectual/patología , Microcefalia/diagnóstico , Microcefalia/patología , Nistagmo Congénito/diagnóstico , Nistagmo Congénito/patología , Fenotipo , Escoliosis/diagnóstico , Escoliosis/patología , Programas Informáticos , Secuenciación del Exoma
14.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37248747

RESUMEN

Human Phenotype Ontology (HPO)-based approaches have gained popularity in recent times as a tool for genomic diagnostics of rare diseases. However, these approaches do not make full use of the available information on disease and patient phenotypes. We present a new method called Phen2Disease, which utilizes the bidirectional maximum matching semantic similarity between two phenotype sets of patients and diseases to prioritize diseases and genes. Our comprehensive experiments have been conducted on six real data cohorts with 2051 cases (Cohort 1, n = 384; Cohort 2, n = 281; Cohort 3, n = 185; Cohort 4, n = 784; Cohort 5, n = 208; and Cohort 6, n = 209) and two simulated data cohorts with 1000 cases. The results of the experiments showed that Phen2Disease outperforms the three state-of-the-art methods when only phenotype information and HPO knowledge base are used, particularly in cohorts with fewer average numbers of HPO terms. We also observed that patients with higher information content scores have more specific information, leading to more accurate predictions. Moreover, Phen2Disease provides high interpretability with ranked diseases and patient HPO terms presented. Our method provides a novel approach to utilizing phenotype data for genomic diagnostics of rare diseases, with potential for clinical impact. Phen2Disease is freely available on GitHub at https://github.com/ZhuLab-Fudan/Phen2Disease.


Asunto(s)
Ontologías Biológicas , Enfermedades Raras , Humanos , Semántica , Genómica , Fenotipo
15.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36907650

RESUMEN

Proteomic studies characterize the protein composition of complex biological samples. Despite recent advancements in mass spectrometry instrumentation and computational tools, low proteome coverage and interpretability remains a challenge. To address this, we developed Proteome Support Vector Enrichment (PROSE), a fast, scalable and lightweight pipeline for scoring proteins based on orthogonal gene co-expression network matrices. PROSE utilizes simple protein lists as input, generating a standard enrichment score for all proteins, including undetected ones. In our benchmark with 7 other candidate prioritization techniques, PROSE shows high accuracy in missing protein prediction, with scores correlating strongly to corresponding gene expression data. As a further proof-of-concept, we applied PROSE to a reanalysis of the Cancer Cell Line Encyclopedia proteomics dataset, where it captures key phenotypic features, including gene dependency. We lastly demonstrated its applicability on a breast cancer clinical dataset, showing clustering by annotated molecular subtype and identification of putative drivers of triple-negative breast cancer. PROSE is available as a user-friendly Python module from https://github.com/bwbio/PROSE.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Proteoma/análisis
16.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37985452

RESUMEN

Charting microRNA (miRNA) regulation across pathways is key to characterizing their function. Yet, no method currently exists that can quantify how miRNAs regulate multiple interconnected pathways or prioritize them for their ability to regulate coordinate transcriptional programs. Existing methods primarily infer one-to-one relationships between miRNAs and pathways using differentially expressed genes. We introduce PanomiR, an in silico framework for studying the interplay of miRNAs and disease functions. PanomiR integrates gene expression, mRNA-miRNA interactions and known biological pathways to reveal coordinated multi-pathway targeting by miRNAs. PanomiR utilizes pathway-activity profiling approaches, a pathway co-expression network and network clustering algorithms to prioritize miRNAs that target broad-scale transcriptional disease phenotypes. It directly resolves differential regulation of pathways, irrespective of their differential gene expression, and captures co-activity to establish functional pathway groupings and the miRNAs that may regulate them. PanomiR uses a systems biology approach to provide broad but precise insights into miRNA-regulated functional programs. It is available at https://bioconductor.org/packages/PanomiR.


Asunto(s)
MicroARNs , MicroARNs/metabolismo , Biología de Sistemas , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Redes Reguladoras de Genes
17.
Mol Syst Biol ; 20(4): 338-361, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38467837

RESUMEN

Microbial biochemistry is central to the pathophysiology of inflammatory bowel diseases (IBD). Improved knowledge of microbial metabolites and their immunomodulatory roles is thus necessary for diagnosis and management. Here, we systematically analyzed the chemical, ecological, and epidemiological properties of ~82k metabolic features in 546 Integrative Human Microbiome Project (iHMP/HMP2) metabolomes, using a newly developed methodology for bioactive compound prioritization from microbial communities. This suggested >1000 metabolic features as potentially bioactive in IBD and associated ~43% of prevalent, unannotated features with at least one well-characterized metabolite, thereby providing initial information for further characterization of a significant portion of the fecal metabolome. Prioritized features included known IBD-linked chemical families such as bile acids and short-chain fatty acids, and less-explored bilirubin, polyamine, and vitamin derivatives, and other microbial products. One of these, nicotinamide riboside, reduced colitis scores in DSS-treated mice. The method, MACARRoN, is generalizable with the potential to improve microbial community characterization and provide therapeutic candidates.


Asunto(s)
Colitis , Enfermedades Inflamatorias del Intestino , Humanos , Animales , Ratones , Enfermedades Inflamatorias del Intestino/tratamiento farmacológico , Enfermedades Inflamatorias del Intestino/metabolismo , Metaboloma , Ácidos y Sales Biliares
18.
Hum Genomics ; 18(1): 28, 2024 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-38509596

RESUMEN

BACKGROUND: In the process of finding the causative variant of rare diseases, accurate assessment and prioritization of genetic variants is essential. Previous variant prioritization tools mainly depend on the in-silico prediction of the pathogenicity of variants, which results in low sensitivity and difficulty in interpreting the prioritization result. In this study, we propose an explainable algorithm for variant prioritization, named 3ASC, with higher sensitivity and ability to annotate evidence used for prioritization. 3ASC annotates each variant with the 28 criteria defined by the ACMG/AMP genome interpretation guidelines and features related to the clinical interpretation of the variants. The system can explain the result based on annotated evidence and feature contributions. RESULTS: We trained various machine learning algorithms using in-house patient data. The performance of variant ranking was assessed using the recall rate of identifying causative variants in the top-ranked variants. The best practice model was a random forest classifier that showed top 1 recall of 85.6% and top 3 recall of 94.4%. The 3ASC annotates the ACMG/AMP criteria for each genetic variant of a patient so that clinical geneticists can interpret the result as in the CAGI6 SickKids challenge. In the challenge, 3ASC identified causal genes for 10 out of 14 patient cases, with evidence of decreased gene expression for 6 cases. Among them, two genes (HDAC8 and CASK) had decreased gene expression profiles confirmed by transcriptome data. CONCLUSIONS: 3ASC can prioritize genetic variants with higher sensitivity compared to previous methods by integrating various features related to clinical interpretation, including features related to false positive risk such as quality control and disease inheritance pattern. The system allows interpretation of each variant based on the ACMG/AMP criteria and feature contribution assessed using explainable AI techniques.


Asunto(s)
Algoritmos , Enfermedades Raras , Humanos , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Pruebas Genéticas , Aprendizaje Automático , Variación Genética/genética , Histona Desacetilasas/genética , Proteínas Represoras/genética
19.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38685113

RESUMEN

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , Fenotipo
20.
Hum Genomics ; 18(1): 34, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38566255

RESUMEN

BACKGROUND: Male-pattern baldness (MPB) is the most common cause of hair loss in men. It can be categorized into three types: type 2 (T2), type 3 (T3), and type 4 (T4), with type 1 (T1) being considered normal. Although various MPB-associated genetic variants have been suggested, a comprehensive study for linking these variants to gene expression regulation has not been performed to the best of our knowledge. RESULTS: In this study, we prioritized MPB-related tissue panels using tissue-specific enrichment analysis and utilized single-tissue panels from genotype-tissue expression version 8, as well as cross-tissue panels from context-specific genetics. Through a transcriptome-wide association study and colocalization analysis, we identified 52, 75, and 144 MPB associations for T2, T3, and T4, respectively. To assess the causality of MPB genes, we performed a conditional and joint analysis, which revealed 10, 11, and 54 putative causality genes for T2, T3, and T4, respectively. Finally, we conducted drug repositioning and identified potential drug candidates that are connected to MPB-associated genes. CONCLUSIONS: Overall, through an integrative analysis of gene expression and genotype data, we have identified robust MPB susceptibility genes that may help uncover the underlying molecular mechanisms and the novel drug candidates that may alleviate MPB.


Asunto(s)
Alopecia , Transcriptoma , Humanos , Masculino , Transcriptoma/genética , Alopecia/genética , Alopecia/metabolismo , Genotipo , Pronóstico , Estudio de Asociación del Genoma Completo , Predisposición Genética a la Enfermedad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA