Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
PLoS Genet ; 17(8): e1009754, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34411094

RESUMEN

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Anotación de Secuencia Molecular/métodos , Animales , Genoma/genética , Genómica/métodos , Genotipo , Humanos , Modelos Genéticos , Herencia Multifactorial/genética , Redes Neurales de la Computación , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos
2.
Bioinformatics ; 36(Suppl_1): i194-i202, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657373

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. RESULTS: In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Expresión Génica , Haplotipos , Fenotipo , Sitios de Carácter Cuantitativo
3.
J Bacteriol ; 201(19)2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31285239

RESUMEN

Expression of motility genes is a potentially beneficial but costly process in bacteria. Interestingly, many isolate strains of Escherichia coli possess motility genes but have lost the ability to activate them under conditions in which motility is advantageous, raising the question of how they respond to these situations. Through transcriptome profiling of strains in the E. coli single-gene knockout Keio collection, we noticed drastic upregulation of motility genes in many of the deletion strains compared to levels in their weakly motile parent strain (BW25113). We show that this switch to a motile phenotype is not a direct consequence of the genes deleted but is instead due to a variety of secondary mutations that increase the expression of the major motility regulator, FlhDC. Importantly, we find that this switch can be reproduced by growing poorly motile E. coli strains in nonshaking liquid medium overnight but not in shaking liquid medium. Individual isolates after the nonshaking overnight incubations acquired distinct mutations upstream of the flhDC operon, including different insertion sequence (IS) elements and, to a lesser extent, point mutations. The rapidity with which genetic changes sweep through the populations grown without shaking shows that poorly motile strains can quickly adapt to a motile lifestyle by genetic rewiring.IMPORTANCE The ability to tune gene expression in times of need outside preordained regulatory networks is an essential evolutionary process that allows organisms to survive and compete. Here, we show that upon overnight incubation in liquid medium without shaking, populations of largely nonmotile Escherichia coli bacteria can rapidly accumulate mutants that have constitutive motility. This effect contributes to widespread secondary mutations in the single-gene knockout library, the Keio collection. As a result, 49/71 (69%) of the Keio strains tested exhibited various degrees of motility, whereas their parental strain is poorly motile. These observations highlight the plasticity of gene expression even in the absence of preexisting regulatory programs and should raise awareness of procedures for handling laboratory strains of E. coli.


Asunto(s)
Proteínas de Escherichia coli/genética , Escherichia coli/fisiología , Perfilación de la Expresión Génica/métodos , Mutación , Técnicas Bacteriológicas/instrumentación , Escherichia coli/crecimiento & desarrollo , Regulación Bacteriana de la Expresión Génica , Técnicas de Inactivación de Genes , Operón , Fenotipo , Transactivadores/genética
4.
J Comput Biol ; 29(1): 19-22, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34985990

RESUMEN

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.


Asunto(s)
Algoritmos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Genómica/estadística & datos numéricos , Heurística , Humanos , Redes Neurales de la Computación , Análisis de Secuencia/estadística & datos numéricos , Programas Informáticos , Aprendizaje Automático no Supervisado
5.
J Comput Biol ; 29(1): 3-18, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35050714

RESUMEN

Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise correspondences. We present single-cell alignment with optimal transport (SCOT), an unsupervised algorithm that uses the Gromov-Wasserstein optimal transport to align single-cell multi-omics data sets. SCOT performs on par with the current state-of-the-art unsupervised alignment methods, is faster, and requires tuning of fewer hyperparameters. More importantly, SCOT uses a self-tuning heuristic to guide hyperparameter selection based on the Gromov-Wasserstein distance. Thus, in the fully unsupervised setting, SCOT aligns single-cell data sets better than the existing methods without requiring any orthogonal correspondence information.


Asunto(s)
Algoritmos , Genómica/estadística & datos numéricos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Humanos , Modelos Estadísticos , Aprendizaje Automático no Supervisado
6.
J Comput Biol ; 29(11): 1213-1228, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36251763

RESUMEN

Multiomic single-cell data allow us to perform integrated analysis to understand genomic regulation of biological processes. However, most single-cell sequencing assays are performed on separately sampled cell populations, as applying them to the same single-cell is challenging. Existing unsupervised single-cell alignment algorithms have been primarily benchmarked on coassay experiments. Our investigation revealed that these methods do not perform well for noncoassay single-cell experiments when there is disproportionate cell-type representation across measurement domains. Therefore, we extend our previous work-Single Cell alignment using Optimal Transport (SCOT)-by using unbalanced Gromov-Wasserstein optimal transport to handle disproportionate cell-type representation and differing sample sizes across single-cell measurements. Our method, SCOTv2, gives state-of-the-art alignment performance across five non-coassay data sets (simulated and real world). It can also integrate multiple (M≥2) single-cell measurements while preserving the self-tuning capabilities and computational tractability of its original version.


Asunto(s)
Algoritmos , Genómica
7.
ACM BCB ; 2020: 1-10, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33954299

RESUMEN

Integrating single-cell measurements that capture different properties of the genome is vital to extending our understanding of genome biology. This task is challenging due to the lack of a shared axis across datasets obtained from different types of single-cell experiments. For most such datasets, we lack corresponding information among the cells (samples) and the measurements (features). In this scenario, unsupervised algorithms that are capable of aligning single-cell experiments are critical to learning an in silico co-assay that can help draw correspondences among the cells. Maximum mean discrepancy-based manifold alignment (MMD-MA) is such an unsupervised algorithm. Without requiring correspondence information, it can align single-cell datasets from different modalities in a common shared latent space, showing promising results on simulations and a small-scale single-cell experiment with 61 cells. However, it is essential to explore the applicability of this method to larger single-cell experiments with thousands of cells so that it can be of practical interest to the community. In this paper, we apply MMD-MA to two recent datasets that measure transcriptome and chromatin accessibility in ~2000 single cells. To scale the runtime of MMD-MA to a more substantial number of cells, we extend the original implementation to run on GPUs. We also introduce a method to automatically select one of the user-defined parameters, thus reducing the hyperparameter search space. We demonstrate that the proposed extensions allow MMD-MA to accurately align state-of-the-art single-cell experiments.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA