Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
Bioinform Adv ; 4(1): vbae009, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38736682

RESUMEN

Motivation: Post-market unexpected Adverse Drug Reactions (ADRs) are associated with significant costs, in both financial burden and human health. Due to the high cost and time required to run clinical trials, there is significant interest in accurate computational methods that can aid in the prediction of ADRs for new drugs. As a machine learning task, ADR prediction is made more challenging due to a high degree of class imbalance and existing methods do not successfully balance the requirement to detect the minority cases (true positives for ADR), as measured by the Area Under the Precision-Recall (AUPR) curve with the ability to separate true positives from true negatives [as measured by the Area Under the Receiver Operating Characteristic (AUROC) curve]. Surprisingly, the performance of most existing methods is worse than a naïve method that attributes ADRs to drugs according to the frequency with which the ADR has been observed over all other drugs. The existing advanced methods applied do not lead to substantial gains in predictive performance. Results: We designed a rigorous evaluation to provide an unbiased estimate of the performance of ADR prediction methods: Nested Cross-Validation and a hold-out set were adopted. Among the existing methods, Kernel Regression (KR) performed best in AUPR but had a disadvantage in AUROC, relative to other methods, including the naïve method. We proposed a novel method that combines non-negative matrix factorization with kernel regression, called VKR. This novel approach matched or exceeded the performance of existing methods, overcoming the weakness of the existing methods. Availability: Code and data are available on https://github.com/YezhaoZhong/VKR.

2.
BMC Bioinformatics ; 25(1): 179, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38714913

RESUMEN

BACKGROUND: As genomic studies continue to implicate non-coding sequences in disease, testing the roles of these variants requires insights into the cell type(s) in which they are likely to be mediating their effects. Prior methods for associating non-coding variants with cell types have involved approaches using linkage disequilibrium or ontological associations, incurring significant processing requirements. GaiaAssociation is a freely available, open-source software that enables thousands of genomic loci implicated in a phenotype to be tested for enrichment at regulatory loci of multiple cell types in minutes, permitting insights into the cell type(s) mediating the studied phenotype. RESULTS: In this work, we present Regulatory Landscape Enrichment Analysis (RLEA) by GaiaAssociation and demonstrate its capability to test the enrichment of 12,133 variants across the cis-regulatory regions of 44 cell types. This analysis was completed in 134.0 ± 2.3 s, highlighting the efficient processing provided by GaiaAssociation. The intuitive interface requires only four inputs, offers a collection of customizable functions, and visualizes variant enrichment in cell-type regulatory regions through a heatmap matrix. GaiaAssociation is available on PyPi for download as a command line tool or Python package and the source code can also be installed from GitHub at https://github.com/GreallyLab/gaiaAssociation . CONCLUSIONS: GaiaAssociation is a novel package that provides an intuitive and efficient resource to understand the enrichment of non-coding variants across the cis-regulatory regions of different cells, empowering studies seeking to identify disease-mediating cell types.


Asunto(s)
Programas Informáticos , Variación Genética , Humanos , Genómica/métodos , Biología Computacional/métodos , Fenotipo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Desequilibrio de Ligamiento
3.
bioRxiv ; 2023 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-37905111

RESUMEN

Motivation: To understand whether sets of genomic loci are enriched at the regulatory loci of one or more cell types, we developed the gaiaAssociation package to perform Regulatory Landscape Enrichment Analysis (RLEA). RLEA is a novel analytical process that tests for enrichment of sets of loci in cell type-specific open chromatin regions (OCRs) in the genome. Results: We demonstrate that the application of RLEA to genome-wide association study (GWAS) data reveals cell types likely to be mediating the phenotype studied, and clusters OCRs based on their shared regulatory profiles. GaiaAssociation is Python code that is freely available for use in functional genomics studies. Availability and Implementation: Gaia Association is available on PyPi (https://pypi.org/project/gaiaAssociation/0.6.0/#description) for pip download and use on the command line or as an inline Python package. Gaia Association can also be installed from GitHub at https://github.com/GreallyLab/gaiaAssociation.

4.
Trends Genet ; 39(11): 803-807, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37714735

RESUMEN

To accelerate the impact of African genomics on human health, data science skills and awareness of Africa's rich genetic diversity must be strengthened globally. We describe the first African genomics data science workshop, implemented by the African Society of Human Genetics (AfSHG) and international partners, providing a framework for future workshops.


Asunto(s)
Ciencia de los Datos , Genómica , Humanos , Genética Humana
5.
NAR Cancer ; 5(3): zcad051, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37746635

RESUMEN

Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.

6.
Genetics ; 225(2)2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37450609

RESUMEN

Variation in the rates and characteristics of germline and somatic mutations across the genome of an organism is informative about DNA damage and repair processes and can also shed light on aspects of organism physiology and evolution. We adapted a recently developed method for inferring somatic mutations from bulk RNA-seq data and applied it to a large collection of Arabidopsis thaliana accessions. The wide range of genomic data types available for A. thaliana enabled us to investigate the relationships of multiple genomic features with the variation in the somatic mutation rate across the genome of this model plant. We observed that late replicated regions showed evidence of an elevated rate of somatic mutation compared to genomic regions that are replicated early. We identified transcriptional strand asymmetries, consistent with the effects of transcription-coupled damage and/or repair. We also observed a negative relationship between the inferred somatic mutation count and the H3K36me3 histone mark which is well documented in the literature of human systems. In addition, we were able to support previous reports of an inverse relationship between inferred somatic mutation count and guanine-cytosine content as well as a positive relationship between inferred somatic mutation count and DNA methylation for both cytosine and noncytosine mutations.


Asunto(s)
Arabidopsis , Tasa de Mutación , Humanos , RNA-Seq , Mutación , Momento de Replicación del ADN , Arabidopsis/genética , Citosina
7.
PLoS Comput Biol ; 18(10): e1010278, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36197939

RESUMEN

Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods implicitly test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates by reducing the influence of genes with greater uncertainty on the estimation of distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes.


Asunto(s)
Genoma , Proyectos de Investigación , Simulación por Computador , Tamaño de la Muestra
8.
BMC Cancer ; 22(1): 840, 2022 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-35918650

RESUMEN

BACKGROUND: Tumour mutation burden (TMB), defined as the number of somatic mutations per megabase within the sequenced region in the tumour sample, has been used as a biomarker for predicting response to immune therapy. Several studies have been conducted to assess the utility of TMB for various cancer types; however, methods to measure TMB have not been adequately evaluated. In this study, we identified two sources of bias in current methods to calculate TMB. METHODS: We used simulated data to quantify the two sources of bias and their effect on TMB calculation, we down-sampled sequencing reads from exome sequencing datasets from TCGA to evaluate the consistency in TMB estimation across different sequencing depths. We analyzed data from ten cancer cohorts to investigate the relationship between inferred TMB and sequencing depth. RESULTS: We found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth. Furthermore, we show that, because only mutations with an observed frequency greater than the threshold are considered, the observed mutant allele frequency provides a biased estimate of the true frequency. This can result in substantial over-estimation of the TMB, when the cancer sample includes a large number of somatic mutations at low frequencies, and exacerbates the lack of robustness of TMB to variation in sequencing depth and tumour purity. CONCLUSION: Our results demonstrate that care needs to be taken in the estimation of TMB to ensure that results are unbiased and consistent across studies and we suggest that accurate and robust estimation of TMB could be achieved using statistical models that estimate the full mutant allele frequency spectrum.


Asunto(s)
Biomarcadores de Tumor , Neoplasias , Biomarcadores de Tumor/genética , Humanos , Mutación , Neoplasias/patología , Secuenciación del Exoma
9.
Cancer Immunol Immunother ; 71(4): 819-827, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34417841

RESUMEN

The major histocompatibility (MHC) molecules are capable of presenting neoantigens resulting from somatic mutations on cell surfaces, potentially directing immune responses against cancer. This led to the hypothesis that cancer driver mutations may occur in gaps in the capacity to present neoantigens that are dependent on MHC genotype. If this is correct, it has important implications for understanding oncogenesis and may help to predict driver mutations based on genotype data. In support of this hypothesis, it has been reported that driver mutations that occur frequently tend to be poorly presented by common MHC alleles and that the capacity of a patient's MHC alleles to present the resulting neoantigens is predictive of the driver mutations that are observed in their tumor. Here we show that these reports of a strong relationship between driver mutation occurrence and patient MHC alleles are a consequence of unjustified statistical assumptions. Our reanalysis of the data provides no evidence of an effect of MHC genotype on the oncogenic mutation landscape.


Asunto(s)
Neoplasias , Alelos , Carcinogénesis/genética , Genotipo , Humanos , Mutación
10.
Sci Rep ; 11(1): 19571, 2021 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-34599249

RESUMEN

Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


Asunto(s)
Antecedentes Genéticos , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Herencia Multifactorial , Fenotipo , Área Bajo la Curva , Bancos de Muestras Biológicas , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/normas , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Curva ROC , Reino Unido
11.
Annu Rev Biomed Data Sci ; 4: 101-122, 2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-34465174

RESUMEN

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.


Asunto(s)
Desequilibrio Alélico , Impresión Genómica , Alelos , Expresión Génica , Humanos , Análisis de Secuencia de ARN
12.
Cancer Inform ; 19: 1176935120972377, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33239857

RESUMEN

MOTIVATION: Somatic mutations can have critical prognostic and therapeutic implications for cancer patients. Although targeted methods are often used to assay specific cancer driver mutations, high throughput sequencing is frequently applied to discover novel driver mutations and to determine the status of less-frequent driver mutations. The task of recovering somatic mutations from these data is nontrivial as somatic mutations must be distinguished from germline variants, sequencing errors, and other artefacts. Consequently, bioinformatics pipelines for recovery of somatic mutations from high throughput sequencing typically involve a large number of analytical choices in the form of quality filters. RESULTS: We present vcfView, an interactive tool designed to support the evaluation of somatic mutation calls from cancer sequencing data. The tool takes as input a single variant call format (VCF) file and enables researchers to explore the impacts of analytical choices on the mutant allele frequency spectrum, on mutational signatures and on annotated somatic variants in genes of interest. It allows variants that have failed variant caller filters to be re-examined to improve sensitivity or guide the design of future experiments. It is extensible, allowing other algorithms to be incorporated easily. AVAILABILITY: The shiny application can be downloaded from GitHub (https://github.com/BrianOSullivanGit/vcfView). All data processing is performed within R to ensure platform independence. The app has been tested on RStudio, version 1.1.456, with base R 3.6.2 and Shiny 1.4.0. A vignette based on a publicly available data set is also available on GitHub.

13.
Cell Rep ; 32(9): 108096, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32877678

RESUMEN

DNA replication initiates from multiple origins, and selective CDC7 kinase inhibitors (CDC7is) restrain cell proliferation by limiting origin firing. We have performed a CRISPR-Cas9 genome-wide screen to identify genes that, when lost, promote the proliferation of cells treated with sub-efficacious doses of a CDC7i. We have found that the loss of function of ETAA1, an ATR activator, and RIF1 reduce the sensitivity to CDC7is by allowing DNA synthesis to occur more efficiently, notably during late S phase. We show that partial CDC7 inhibition induces ATR mainly through ETAA1, and that if ATR is subsequently inhibited, origin firing is unleashed in a CDK- and CDC7-dependent manner. Cells are then driven into a premature and highly defective mitosis, a phenotype that can be recapitulated by ETAA1 and TOPBP1 co-depletion. This work defines how ATR mediates the effects of CDC7 inhibition, establishing the framework to understand how the origin firing checkpoint functions.


Asunto(s)
Proteínas de la Ataxia Telangiectasia Mutada/metabolismo , Proteínas de Ciclo Celular/antagonistas & inhibidores , Replicación del ADN/fisiología , ADN/biosíntesis , Proteínas Serina-Treonina Quinasas/antagonistas & inhibidores , Antígenos de Superficie/genética , Antígenos de Superficie/metabolismo , Proteínas de la Ataxia Telangiectasia Mutada/antagonistas & inhibidores , Proteínas de la Ataxia Telangiectasia Mutada/genética , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Línea Celular , ADN/genética , Células HEK293 , Células HeLa , Humanos , Mitosis/fisiología , Proteínas Serina-Treonina Quinasas/genética , Proteínas Serina-Treonina Quinasas/metabolismo
14.
J Mol Evol ; 88(7): 549-561, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32617614

RESUMEN

Phylogenetic models of the evolution of protein-coding sequences can provide insights into the selection pressures that have shaped them. In the application of these models synonymous nucleotide substitutions, which do not alter the encoded amino acid, are often assumed to have limited functional consequences and used as a proxy for the neutral rate of evolution. The ratio of nonsynonymous to synonymous substitution rates is then used to categorize the selective regime that applies to the protein (e.g., purifying selection, neutral evolution, diversifying selection). Here, we extend the Muse and Gaut model of codon evolution to explore the extent of purifying selection acting on substitutions between synonymous stop codons. Using a large collection of coding sequence alignments, we estimate that a high proportion (approximately 57%) of mammalian genes are affected by selection acting on stop codon preference. This proportion varies substantially by codon, with UGA stop codons far more likely to be conserved. Genes with evidence of selection acting on synonymous stop codons have distinctive characteristics, compared to unconserved genes with the same stop codon, including longer [Formula: see text] untranslated regions (UTRs) and shorter mRNA half-life. The coding regions of these genes are also much more likely to be under strong purifying selection pressure. Our results suggest that the preference for UGA stop codons found in many multicellular eukaryotes is selective rather than mutational in origin.


Asunto(s)
Codón de Terminación , Evolución Molecular , Mamíferos/genética , Modelos Genéticos , Animales , Humanos , Filogenia
15.
HRB Open Res ; 3: 89, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33855271

RESUMEN

Genomics is revolutionizing biomedical research, medicine and healthcare globally in academic, public and industry sectors alike. Concrete examples around the world show that huge benefits for patients, society and economy can be accrued through effective and responsible genomic research and clinical applications. Unfortunately, Ireland has fallen behind and needs to act now in order to catch up. Here, we identify key issues that have resulted in Ireland lagging behind, describe how genomics can benefit Ireland and its people and outline the measures needed to make genomics work for Ireland and Irish patients. There is now an urgent need for a national genomics strategy that enables an effective, collaborative, responsible, well-regulated, and patient centred environment where genome research and clinical genomics can thrive.  We present eight recommendations that could be the pillars of a national genomics health strategy.

16.
PLoS One ; 14(4): e0215987, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31022271

RESUMEN

Cell subtype proportion variability between samples contributes significantly to the variation of functional genomic properties such as gene expression or DNA methylation. Although the impact of the variation of cell subtype composition on measured genomic quantities is recognized, and some innovative tools have been developed for the analysis of heterogeneous samples, most functional genomics studies using samples with mixed cell types still ignore the influence of cell subtype proportion variation, or just deal with it as a nuisance variable to be eliminated. Here we demonstrate how harvesting information about cell subtype proportions from functional genomics data can provide insights into cellular changes associated with phenotypes. We focused on two types of mixed cell populations, human blood and mouse kidney. Cell type prediction is well developed in the former, but not currently in the latter. Estimating the cellular repertoire is easier when a reference dataset from purified samples of all cell types in the tissue is available, as is the case for blood. However, reference datasets are not available for most other tissues, such as the kidney. In this study, we showed that the proportion of alterations attributable to changes in the cellular composition varies strikingly in the two disorders (asthma and systemic lupus erythematosus), suggesting that the contribution of cell subtype proportion changes to functional genomic properties can be disease-specific. We also showed that a reference dataset from a single-cell RNA-seq study successfully estimated the cell subtype proportions in mouse kidney and allowed us to distinguish altered cell subtype differences between two different knock-out mouse models, both of which had reported a reduced number of glomeruli compared to their wild-type counterparts. These findings demonstrate that testing for changes in cell subtype proportions between conditions can yield important insights in functional genomics studies.


Asunto(s)
Algoritmos , Genómica/métodos , Animales , Asma/genética , Metilación de ADN/genética , Bases de Datos Genéticas , Regulación de la Expresión Génica , Humanos , Lupus Eritematoso Sistémico/genética , Ratones , Estándares de Referencia
17.
Genome Biol ; 19(1): 130, 2018 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-30205839

RESUMEN

Expression quantitative trait loci (eQTLs) identified using tumor gene expression data could affect gene expression in cancer cells, tumor-associated normal cells, or both. Here, we have demonstrated a method to identify eQTLs affecting expression in cancer cells by modeling the statistical interaction between genotype and tumor purity. Only one third of breast cancer risk variants, identified as eQTLs from a conventional analysis, could be confidently attributed to cancer cells. The remaining variants could affect cells of the tumor microenvironment, such as immune cells and fibroblasts. Deconvolution of tumor eQTLs will help determine how inherited polymorphisms influence cancer risk, development, and treatment response.


Asunto(s)
Expresión Génica , Modelos Estadísticos , Neoplasias/genética , Sitios de Carácter Cuantitativo , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Carcinogénesis/genética , Simulación por Computador , Femenino , Fibroblastos/metabolismo , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Neoplasias/metabolismo , Microambiente Tumoral
19.
Proc Natl Acad Sci U S A ; 115(26): E5840, 2018 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-29903904
20.
Epigenetics ; 12(8): 591-606, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28557546

RESUMEN

Aberrant DNA methylation patterns have been reported in inflamed tissues and may play a role in disease. We studied DNA methylation and gene expression profiles of purified intestinal epithelial cells from ulcerative colitis patients, comparing inflamed and non-inflamed areas of the colon. We identified 577 differentially methylated sites (false discovery rate <0.2) mapping to 210 genes. From gene expression data from the same epithelial cells, we identified 62 differentially expressed genes with increased expression in the presence of inflammation at prostate cancer susceptibility genes PRAC1 and PRAC2. Four genes showed inverse correlation between methylation and gene expression; ROR1, GXYLT2, FOXA2, and, notably, RARB, a gene previously identified as a tumor suppressor in colorectal adenocarcinoma as well as breast, lung and prostate cancer. We highlight targeted and specific patterns of DNA methylation and gene expression in epithelial cells from inflamed colon, while challenging the importance of epithelial cells in the pathogenesis of chronic inflammation.


Asunto(s)
Colitis Ulcerosa/genética , Metilación de ADN , Mucosa Intestinal/metabolismo , Adulto , Colitis Ulcerosa/metabolismo , Femenino , Factor Nuclear 3-beta del Hepatocito/genética , Factor Nuclear 3-beta del Hepatocito/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Receptores Huérfanos Similares al Receptor Tirosina Quinasa/genética , Receptores Huérfanos Similares al Receptor Tirosina Quinasa/metabolismo , Receptores de Ácido Retinoico/genética , Receptores de Ácido Retinoico/metabolismo , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA